Skip to yearly menu bar Skip to main content


Poster

The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL

Yingru Li ⋅ Jiawei Xu ⋅ Ziniu Li ⋅ Jiacai Liu ⋅ Wei Liu ⋅ Yuxuan Tong ⋅ Longtao Zheng ⋅ Zhenghai Xue ⋅ Yaxiang Zhang ⋅ Tianle Cai ⋅ Ge Zhang ⋅ Qian Liu ⋅ Baoxiang Wang

Abstract

Log in and register to view live content