When Simple Problems Wear Complex Costumes: Improving Efficiency in LRM’s Adaptive Reasoning
Abstract
Recent Large Reasoning Models (LRMs) have demonstrated powerful multi-step problem-solving capabilities but often suffer from inefficiency due to an ``overthinking phenomenon", where they apply complex reasoning to simple tasks, resulting in unnecessary computational cost and latency. While adaptive reasoning models that can switch between generating explicit reasoning and producing direct answers offer a potential solution, their effectiveness is compromised by a critical flaw: they are often misled by superficial linguistic complexity, mistaking verbosely phrased simple problems for complex ones. To address this, we propose a two-stage training framework to create a more robust adaptive reasoner. The first stage uses supervised fine-tuning with augmented data—presenting simple problems in both concise and redundant forms—to teach the model to ignore superficial verbosity. Subsequently, a reinforcement learning phase utilizes Generalized Reward Policy Optimization (GRPO) with a custom reward function to refine the model's adaptive policy, ensuring it selects a reasoning mode based on true task complexity rather than surface-level cues. The resulting model reduces computational overhead without sacrificing accuracy and demonstrates improved robustness to misleading linguistic cues.