Skip to yearly menu bar Skip to main content


Inference-Time Reward Hacking in Large Language Models

Hadi Khalaf ⋅ Claudio Mayrink Verdun ⋅ Alex Oesterling ⋅ Himabindu Lakkaraju ⋅ Flavio Calmon

Abstract

Chat is not available.