Lookahead Unmasking Elicits Reliable Decoding in Diffusion Language Models
Abstract
Masked Diffusion Models (MDMs) as language models generate by iteratively unmasking tokens, yet their performance crucially depends on the inference-time order of unmasking. Conventional methods such as confidence-based sampling are short-sighted, focusing on local optimization which neglects test-time computation and allows early decoding errors to cascade. We propose Lookahead Unmasking (LookUM), which addresses these concerns by guiding sampling path with a verifier over alternative unmasking orders, without requiring an external reward model. Our framework couples (i) a path generator that proposes paths by sampling from pools of unmasking sets with (ii) a verifier that computes the uncertainty of the proposed paths and performs importance sampling to subsequently select the final paths. Erroneous unmasking inflates sequence-level uncertainty, and our method exploits this to avoid error-prone trajectories. We validate our framework across six benchmarks, such as mathematics, planning, and coding, and demonstrate consistent performance improvements. LookUM requires only two to three paths to achieve peak performance. LLaDA with LookUM matches the performance of RL-tuned LLaDA 1.5 and yields additional gains when applied to LLaDA 1.5, suggesting complementarity with reinforcement learning.