Skip to yearly menu bar Skip to main content


Reward Shaping to Mitigate Reward Hacking in RLHF

Jiayi Fu ⋅ Xuandong Zhao ⋅ Chengyuan Yao ⋅ Heng Wang ⋅ Qi Han ⋅ Yanghua Xiao

Abstract

Chat is not available.