Skip to yearly menu bar Skip to main content


It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF

TaiMing Lu · Lingfeng Shen · Xinyu Yang · Weiting Tan · Beidi Chen · Huaxiu Yao

Abstract

Chat is not available.