Skip to yearly menu bar Skip to main content


Poster

Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis

Darshan Deshpande ⋅ Anand Kannappan ⋅ Rebecca Qian

Abstract

Log in and register to view live content