Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models
Jinbin Bai ⋅ Yixuan Li ⋅ Yuchen Zhu ⋅ Yi Xin ⋅ Qingyu Shi ⋅ Aosong Feng ⋅ Xiaohong Liu ⋅ Molei Tao ⋅ Jianru Xue ⋅ Xiangtai Li ⋅ Ming-Hsuan Yang
Abstract
Inference-time compute has re-emerged as a practical way to improve LLM reasoning. Most test-time scaling (TTS) algorithms rely on autoregressive decoding, which is ill-suited to discrete diffusion language models (dLLMs) due to their parallel decoding over the entire sequence. As a result, developing effective and efficient TTS methods to unlock dLLMs' full generative potential remains an underexplored challenge. To address this, we propose \textbf{LLaDA-S}, an efficient TTS framework for dLLMs that (i) performs \textbf{Hierarchical Trajectory Search} (HTS) which dynamically prunes and reallocates compute in an early-to-mid denoising window, (ii) replaces external verifiers with \textbf{Self-Verified Feedback} (SVF) obtained via self-evaluation prompts on intermediate completions, and (iii) introduces \textbf{Local branching with partial remasking} to explore diverse implementations while preserving a high-confidence tokens. Across four mathematical reasoning and code generation benchmarks on three dLLMs, including LLaDA 8B Instruct, Dream 7B Instruct, and LLaDA 2.0-mini, our LLaDA-S achieves a favorable performance-efficiency trade-off, matching best-of-$N$ performance with substantially fewer function evaluations (NFE). The code will be released.
Successful Page Load