Skip to yearly menu bar Skip to main content


OctoThinker: Mid-Training Incentivizes Reinforcement Learning Scaling

Zengzhi Wang ⋅ Fan Zhou ⋅ Xuefeng Li ⋅ Pengfei Liu

Abstract

Chat is not available.