Skip to yearly menu bar Skip to main content


It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF

TaiMing Lu ⋅ Lingfeng Shen ⋅ Xinyu Yang ⋅ Weiting Tan ⋅ Beidi Chen ⋅ Huaxiu Yao

Abstract

Chat is not available.