Intervene When It Doubts: Conjunction-Guided Interactive Reasoning
Abstract
Large Reasoning Models (LRMs) excel at complex reasoning but suffer from inefficient reasoning, like overthinking and overshoot. These issues stem from excessive or misdirected reasoning triggered by the model's "doubt", manifested as self-validation and exploratory extension, increasing computational cost and degrading performance. Existing efficient reasoning methods seek to regulate reasoning via internal signals or static schedules, lacking specialization in the "doubt" characteristics of LRMs. To address this, we propose a Conjunction-Guided Intervention (CGI) reasoning framework that intervenes when the model shows signs of doubt. Our key insight is that overthinking and overshoot in LRMs arise from conjunction-triggered extensions where LRMs exhibit "doubt" through transitional conjunctions, extending redundant self-validation and exploration without timely state-based correction. Building on this insight, CGI pauses reasoning at conjunction markers of doubt for external state-based feedback, adaptively extending or terminating reasoning to reduce redundancy while preserving accuracy. The feedback is generated via criteria evaluation (rationality and completeness) and comes from either human or LLM proxies. We train the target model by Group Relative Policy Optimization (GRPO) to adapt to the interactive mode. Experiments show that our framework achieves a superior balance between accuracy and reasoning length.