Skip to yearly menu bar Skip to main content


Poster

Probing RLVR Training Instability through the Lens of Objective-Level Hacking

Yiming Dong ⋅ Kun Fu ⋅ Haoyu Li ⋅ Xinyuan Zhu ⋅ Yurou Liu ⋅ Lijing Shao ⋅ Jieping Ye ⋅ Zheng Wang

Abstract

Log in and register to view live content