Skip to yearly menu bar Skip to main content


Thomas: Learning to Explore Human Preference via Probabilistic Reward Model

Sang Truong ⋅ Duc Nguyen ⋅ Tho Quan ⋅ Sanmi Koyejo

Abstract

Video

Chat is not available.