Skip to yearly menu bar Skip to main content


Poster

T$^2$PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

Haixin Wang ⋅ Hejie Cui ⋅ Chenwei Zhang ⋅ Xin Liu ⋅ Shuowei Jin ⋅ Shijie Geng ⋅ Xinyang Zhang ⋅ Nasser Zalmout ⋅ Zhenyu Shi ⋅ Yizhou Sun

Abstract

Log in and register to view live content