DC-Leap: Training-Free Acceleration of dLLMs via Draft-Guided Contiguous Leaping Decoding
Yan hua Jiao ⋅ Tianyi Wu ⋅ Xiaoxi Sun ⋅ Yulin Li ⋅ Huiling Zhen ⋅ Libo Qin ⋅ Baotian Hu ⋅ Zhuotao Tian ⋅ Min zhang
Abstract
While parallel decoding is central to the efficiency of Diffusion Large Language Models (dLLMs), current strategies are often hindered by overly conservative confidence thresholds. These thresholds, necessitated by the Joint Probability Dependence Error (JPDE), result in redundant denoising iterations and suboptimal inference speeds. To overcome this, we propose DC-Leap, a training-free framework that enables reliable acceleration of dLLMs in the moderate-confidence regime. DC-Leap introduces a Dynamic Contiguous Verification strategy that integrates strictly-ordered causal constraints into the parallel decoding process. By progressively validating token dependencies, this mechanism effectively neutralizes the JPDE, enabling reliable acceleration with near-lossless performance. Furthermore, DC-Leap incorporates the draft-guided decoding mechanism, where the draft helps extend the context by leaping forward across multiple tokens, providing look-ahead context and retaining the structural benefits of bidirectional attention during inference. Extensive experiments on standard benchmarks demonstrate that DC-Leap achieves substantial speedups, up to **53.19$\times$** on MBPP for long-sequence generation, and up to **105.02$\times$** when combined with KV-Cache with comparable generation quality. Code and models will be made publicly available.
Successful Page Load