Stop When Further Reasoning Won’t Help: Attention-State Adaptive Generation in Reasoning Models
Abstract
By incorporating test-time compute scaling, large reasoning models (LRMs) are able to solve complex problems by generating explicit chain-of-thought (CoT) reasoning processes. However, they often suffer from overthinking during generation, resulting in redundant token outputs and degraded accuracy. Existing methods to mitigate this issue remain limited: training-based approaches incur substantial training costs, while training-free methods often rely on well-crafted prompting or unreliable confidence signals. In this work, we study early stopping through attention distributions and propose a simple method, ASAG, that infers the model's reasoning state and adaptively adjusts the generation strategy. The proposed method is training-free and plug-and-play, enabling seamless integration into existing LRMs. Extensive experiments on nine benchmarks demonstrate consistent improvements across mainstream LRMs with varying parameter scales, including the Deepseek-R1-Distill and Qwen3 series. In particular, ASAG achieves a 4.4% relative improvement in accuracy while reducing the number of generated tokens by over 40% across all reasoning tasks on Qwen3-8B.