Skip to yearly menu bar Skip to main content


OctoThinker: Mid-Training Incentivizes Reinforcement Learning Scaling

Zengzhi Wang · Fan Zhou · Xuefeng Li · Pengfei Liu

Abstract

Chat is not available.