Skip to yearly menu bar Skip to main content


Poster

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

Andi Nika ⋅ Debmalya Mandal ⋅ Parameswaran Kamalaruban ⋅ Georgios Tzannetos ⋅ Goran Radanovic ⋅ Adish Singla
2024 Poster

Abstract

Chat is not available.