Skip to yearly menu bar Skip to main content


Learning to Reason by Failing: Offline RL on Sub-optimal Rollouts Scales Synthetic Data by 8x

Amrith Setlur ⋅ Saurabh Garg ⋅ Xinyang Geng ⋅ Naman Garg ⋅ Virginia Smith ⋅ Aviral Kumar

Abstract

Chat is not available.