Skip to yearly menu bar Skip to main content


Poster

DisPPO: Quantile-Based Distributional Reinforcement Learning for Large Language Models

Zhijian Zhou ⋅ Long Li ⋅ Xuan Zhang ⋅ Zongkai Liu ⋅ Yanting Miao ⋅ Yuchen Liu ⋅ Deshu Chen ⋅ Ke Li ⋅ Xing Sun ⋅ Ruoxi Jiang ⋅ Xiaoyu Tan ⋅ chao qu ⋅ Yuan Qi

Abstract

Log in and register to view live content