Skip to yearly menu bar Skip to main content


Poster Thu, Jul 9, 2026 • 1:00 AM – 2:45 AM PDT HALL A #300

Breaking the Self-Confirming Loop: Diagnosing and Mitigating Systemic Reward Bias in Self-Rewarding RL

Chuyi Tan ⋅ Peiwen Yuan ⋅ Xinglin Wang ⋅ Yiwei Li ⋅ Shaoxiong Feng ⋅ Yueqi Zhang ⋅ Jiayi Shi ⋅ Ji Zhang ⋅ Boyuan Pan ⋅ Yao Hu ⋅ Kan Li

Abstract

Log in and register to view live content