Skip to yearly menu bar Skip to main content


Poster

Confidence and Difficulty-Adaptive Policy Optimization for LLM Reasoning

Zhanke Zhou ⋅ Xiangyu Lu ⋅ Chentao Cao ⋅ Brando Miranda ⋅ Tongliang Liu ⋅ Bo Han ⋅ Sanmi Koyejo

Abstract

Log in and register to view live content