LUGS: Latent-aware Guidance for Efficient Unmasking in Diffusion Large Language Models
Abstract
Diffusion Language Models (DLMs) have emerged as a flexible alternative to autoregressive (AR) models. They can decode tokens in any order, but the generation quality critically depends on the decoding strategy. Existing approaches predominantly rely on local heuristics, such as confidence or entropy, which may fail to capture sequence-level dependencies and the semantics in the context. To solve this problem, we propose Latent-aware Unmasking Guidance Search (\LUGS{}), a novel decoding framework that leverages the model's internal hidden states to guide the unmasking process. By incorporating latent-aware scores to compensate for the limitations of local heuristics such as confidence or entropy, \LUGS{} improves the model's performance. Extensive experiments on various downstream tasks demonstrate that our approach consistently outperforms existing baselines on LLaDA-8B-instruct and LLaDA-1.5 models. In Science and Reason tasks, \LUGS{} improved performance by more than 1\% on both base models. And \LUGS{} obtains an average improvement of 3.5\% in code generation. Remarkably, \LUGS{} outperforms the beam search baseline by more than 5\% on average using LLaDA-8B-Instruct on code tasks. These results highlight the potential of latent-aware guidance for advancing controllable and high-quality generation.