Skip to yearly menu bar Skip to main content


Poster

Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models

Yun Qu ⋅ Cheems Wang ⋅ Yixiu Mao ⋅ Heming Zou ⋅ Yuhang Jiang ⋅ Weijie Liu ⋅ Clive Bai ⋅ Kai Yang ⋅ Yangkun Chen ⋅ Saiyong Yang ⋅ Xiangyang Ji

Abstract

Log in and register to view live content