Skip to yearly menu bar Skip to main content


Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

Jiaxiang Li · Siliang Zeng · Hoi To Wai · Chenliang Li · Alfredo Garcia · Mingyi Hong

Abstract

Chat is not available.