Beyond Confidence: Adaptive and Coherent Decoding for Diffusion Language Models
Abstract
Diffusion Language Models (DLMs) have recently achieved significant success due to their any-order generation capabilities. However, existing inference methods typically rely on local, immediate-step metrics—such as confidence or entropy—which inherently lack a more reliable perspective, leading to sub-optimal generation quality. To address this, we propose Coherent Contextual Decoding (CCD), a novel inference framework built upon two core innovations. First, CCD bypasses the potential bias of the single context to leverage historical contexts for approximating the marginal distribution of token prediction, leading to better sequence coherence and the early rejection of sub-optimal paths. More importantly, we demonstrate that this mechanism is theoretically equivalent to modeling the consistency of historical steps via the conditional mutual information between contexts and token predictions. Finally, CCD achieves significantly milder performance degradation under highly parallel decoding scenarios compared to baselines. Empirically, our method achieves a simultaneous enhancement in both inference speed and performance across diverse benchmarks on Dream and LLaDA.