Skip to yearly menu bar Skip to main content


Poster

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Banghua Zhu ⋅ Michael Jordan ⋅ Jiantao Jiao
2024 Poster

Abstract

Chat is not available.