Skip to yearly menu bar Skip to main content


Poster Wed, Jul 8, 2026 • 1:00 AM – 2:45 AM PDT HALL A #1808

The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL

Yingru Li ⋅ Jiawei Xu ⋅ Ziniu Li ⋅ Jiacai Liu ⋅ Wei Liu ⋅ Yuxuan Tong ⋅ Longtao Zheng ⋅ Zhenghai Xue ⋅ Yaxiang Zhang ⋅ Tianle Cai ⋅ Ge Zhang ⋅ Qian Liu ⋅ Baoxiang Wang

Abstract

Log in and register to view live content