One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models
Abstract
Looped transformers scale computational depth independent of parameter count by repeatedly applying the same layer. However, training these models over long horizons creates significant optimization challenges. Specifically, it is difficult for looped transformers that start from noise to steer towards a potentially complex output without additional supervision. Diffusion models tackle this issue by corrupting data with varying magnitudes of noise and training the model to reverse it in a single step. However, this process misaligns training and testing behaviour. We introduce Denoising Recursion Models, a method that similarly corrupts data with noise but trains the model to reverse the corruption over multiple recursive steps. This strategy provides a tractable curriculum of intermediate states, while better aligning training with testing and incentivizing non-greedy, forward-looking generation. Through extensive experiments, we showed this approach outperformed the Tiny Recursion Model (TRM) on ARC-AGI, where it recently achieved breakthrough performance.