Skip to yearly menu bar Skip to main content


Reward Shaping to Mitigate Reward Hacking in RLHF

Jiayi Fu · Xuandong Zhao · Chengyuan Yao · Heng Wang · Qi Han · Yanghua Xiao

Abstract

Chat is not available.