Skip to yearly menu bar Skip to main content


Poster

Spurious Rewards: Rethinking Training Signals in RLVR

Rulin Shao ⋅ Stella Li ⋅ Rui Xin ⋅ Scott Geng ⋅ Yiping Wang ⋅ Sewoong Oh ⋅ Simon Du ⋅ Nathan Lambert ⋅ Sewon Min ⋅ Ranjay Krishna ⋅ Yulia Tsvetkov ⋅ Hannaneh Hajishirzi ⋅ Pang Wei Koh ⋅ Luke Zettlemoyer

Abstract

Log in and register to view live content