Skip to yearly menu bar Skip to main content


Poster Thu, Jul 9, 2026 • 5:00 PM – 6:45 PM KST HALL A #3212

One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models

Daniel Fein ⋅ Max Lamparth ⋅ Violet Xiang ⋅ Mykel Kochenderfer ⋅ Nick Haber

Abstract

Log in and register to view live content