Thinking in Flow: A Dissipative Stabilization Operator for Robust Autoregressive Reasoning
Abstract
Chain-of-Thought (CoT) prompting enables multi-step reasoning in large language models, yet long-horizon generation remains brittle under distribution shift and context interference: irrelevant cues persist, small deviations compound into inference drift, and late-stage corrections can destabilize the trajectory. We recast autoregressive decoding as a perturbed long-horizon dynamical system and introduce an inference-time stabilization operator that targets trajectory-level reliability rather than token-level fluency. Specifically, we propose ODE-guided language models, which augment a base Transformer with a persistent continuous-time thought state whose dynamics are explicitly designed to be dissipative, enabling stable evidence accumulation with controlled forgetting. Instantiating this framework, Thinking in Flow (TiF) equips the model with a lightweight Neural ODE controller and injects its output through post-norm residual updates to achieve numerically stable, low-intrusion steering. A demand--supply (uncertainty--capacity) gate determines when intervention is warranted, while a direction gate determines how to steer in representation space, yielding selective, do-no-harm corrections instead of persistent bias. We establish well-posedness, dissipativity, and incremental stability of the controlled thought dynamics, implying bounded interventions over arbitrarily long contexts, and empirically demonstrate improved robustness to distractions and semantic perturbations, while matching or improving accuracy on mathematical reasoning benchmarks across both the Llama and Qwen model families; we further observe gains on non-mathematical BBH reasoning tasks when training TiF on Llama.