Skip to yearly menu bar Skip to main content


Learning to Reason by Failing: Offline RL on Sub-optimal Rollouts Scales Synthetic Data by 8x

Amrith Setlur · Saurabh Garg · Xinyang Geng · Naman Garg · Virginia Smith · Aviral Kumar

Abstract

Chat is not available.