Skip to yearly menu bar Skip to main content


Thomas: Learning to Explore Human Preference via Probabilistic Reward Model

Sang Truong · Duc Nguyen · Tho Quan · Sanmi Koyejo

Abstract

Video

Chat is not available.