Skip to yearly menu bar Skip to main content


Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafael Rafailov ⋅ Archit Sharma ⋅ Eric Mitchell ⋅ Stefano Ermon ⋅ Christopher Manning ⋅ Chelsea Finn

Abstract

Video

Chat is not available.