Diffusion Language Model Parallel Decoding via Product-of-Experts Bridge
Juntong Shi ⋅ Brian Trippe ⋅ Jure Leskovec ⋅ Stefano Ermon ⋅ Minkai Xu
Abstract
Diffusion language models (DLMs) offer substantial speed advantages through parallel decoding, but the lack of token dependencies limits generation quality compared to autoregressive (AR) models. Recent progress attempts to bridge the gap via importance sampling, with DLM being the proposal and AR being the target. However, due to the huge gap between their probability space, the sampling requires a large number of particles and thus expensive computation. In this paper, we introduce PoE-Bridge, a novel decoding framework that drastically improves generation speed and accuracy by introducing an intermediate distribution to bridge the gap. The distribution is constructed as a Product-of-Experts (PoE) of the DLM proposal and the AR target. With the intermediate distribution, we first conduct multi-token sampling with the DLM and then apply rejection sampling using the PoE to retain only the verified tokens. The generated chunks are then evaluated by the AR target via importance sampling to produce the final faithful generation. We further propose several improved techniques, including mixed-temperature sampling for enhanced diversity and elastic rejection windows for reducing wasted verification. Empirically, PoE-Bridge achieves significantly improved accuracy with $5\times$ speedup over the standard DLM decoding approach, and recovering at least 95% of the target AR model's performance, efficiently advancing most of the quality gap on challenging mathematical reasoning and coding tasks.
Successful Page Load