Skip to yearly menu bar Skip to main content


Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

Jiaxiang Li ⋅ Siliang Zeng ⋅ Hoi To Wai ⋅ Chenliang Li ⋅ Alfredo Garcia ⋅ Mingyi Hong

Abstract

Chat is not available.