Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning
Benhao Huang ⋅ Zhengyang Geng ⋅ Zico Kolter
Abstract
Reasoning is central to building intelligent systems that can solve unseen problems beyond training. Yet we still lack a principled understanding of what internal mechanism enables neural networks to generalize reasoning beyond memorized patterns. We hypothesize that generalizable reasoning emerges through learning task-conditioned attractors. Concretely, the model learns a latent dynamical system whose fixed points correspond to valid solutions. We term models that reason by converging to such task-conditioned fixed points *Equilibrium Reasoners (EqR)*. This attractor view elucidates when and how to scale test-time compute. Empirically, improvements from scaling test-time compute are tightly coupled with convergence to attractors. By shaping a more favorable attractor landscape and leveraging stochasticity, EqR improves convergence and scales reliably at test time. Our models scale along two axes: *depth* by running more solver steps, and *width* by aggregating stochastic trajectories from multiple random initializations. As we scale test-time compute by $8192\times$, with max effective layers surpassing 300,000 layers when unrolled, reasoning accuracy rises from 8\% to over 99\% on Sudoku-Extreme. We hope our attractor perspective sheds light on scalable reasoning through test-time computation.
Successful Page Load