DiffuReason: Enhancing Reasoning Ability for Diffusion Language Models via Monte Carlo Tree Search
Abstract
Auto-Regressive (AR) models with Monte Carlo Tree Search (MCTS) are a dominant paradigm for achieving “System 2” reasoning. However, this approach suffers from significant latency due to the serial, token-by-token generation mechanism of AR models. In contrast, Diffusion Large Language Models (dLLMs) offer inherent speed advantages via parallel sequence generation, yet they often struggle with accuracy in complex reasoning due to a lack of rigorous search, evaluation, and revision capabilities. Directly applying MCTS to diffusion models faces architectural barriers, since the denoising generation process lacks the discrete decision steps that naturally accommodate tree search. To retain efficiency while improving the reasoning ability, we propose DiffuReason, a Monte Carlo tree search reasoning algorithm for diffusion models. By modeling the generation process as a Markov Decision Process (MDP), DiffuReason discretizes the continuous diffusion flow into searchable thought blocks. During the reverse generation process, DiffuReason recursively performs four MCTS-style stages: select the best node (block), expand to obtain candidate nodes, simulate to evaluate node values, and revise the unsatisfactory nodes. Experiments on mathematical reasoning benchmarks demonstrate that DiffuReason significantly improves the reasoning ability of diffusion models, and achieves superior balance of accuracy and efficiency even compared with auto-regressive models.