Skip to yearly menu bar Skip to main content


Poster

Beyond Majority Voting: Self-Reflective Test-Time Reinforcement Learning for LLM Reasoning

Sitong Wu ⋅ Haoru Tan ⋅ Xichen Zhang ⋅ Bin Xia ⋅ Shaofeng Zhang ⋅ XIAOJUAN QI ⋅ Bei Yu ⋅ Jiaya Jia

Abstract

Log in and register to view live content