Skip to yearly menu bar Skip to main content


Inference-Time Reward Hacking in Large Language Models

Hadi Khalaf · Claudio Mayrink Verdun · Alex Oesterling · Himabindu Lakkaraju · Flavio Calmon

Abstract

Chat is not available.