Consistent Diffusion Language Models
Abstract
Diffusion language models (DLMs) promise sublinear-time generation via parallel decoding, yet realizing this efficiency remains elusive as high-quality sampling typically requires hundreds of refinement steps. In continuous domains, consistency-based training accelerates diffusion by enforcing invariance along a probability flow ODE. However, discrete diffusion admits no such ODE, rendering direct adaptation ill-defined. We bridge this gap with Multi-Path Discrete Consistency (MPDC), a new principle that replaces the non-existent unique trajectory with a distributional ensemble of exact posterior bridges connecting different noise levels. Building on this idea, we introduce the Consistent Diffusion Language Model (CDLM), a general framework that learns path-independent denoisers by enforcing prediction consistency across these stochastic bridges. We show that CDLM unifies and generalizes discrete diffusion, consistency, and distillation objectives within a single view applicable to diverse corruption processes, including both masked and uniform diffusion. Empirically, CDLM establishes a new state of the art on conditional and unconditional text-generation benchmarks, consistently outperforming strong base DLMs and often even multi-stage distilled baselines, with particularly large gains in the few-step regime. Together, these results position CDLM as a principled and scalable paradigm for efficient, high-fidelity discrete generative modeling.