Skip to yearly menu bar Skip to main content


Poster

Exploration Hacking: LLMs Can Learn to Resist RL Training

Eyon Jang ⋅ Damon Falck ⋅ Joschka Cedric Braun ⋅ Nathalie Kirch ⋅ Achyutha Menon ⋅ Perusha Moodley ⋅ Scott Emmons ⋅ Roland S. Zimmermann ⋅ David Lindner

Abstract

Log in and register to view live content