Skip to yearly menu bar Skip to main content


Poster

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint

Wei Xiong ⋅ Hanze Dong ⋅ Chenlu Ye ⋅ Ziqi Wang ⋅ Han Zhong ⋅ Heng Ji ⋅ Nan Jiang ⋅ Tong Zhang
2024 Poster

Abstract

Chat is not available.