Skip to yearly menu bar Skip to main content


Poster

Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use

Kunvar Thaman

Abstract

Log in and register to view live content