Poster Tue, Jul 7, 2026 • 10:30 AM – 12:15 PM KST Coex: HALL A

Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning

Yan Jiang ⋅ Ruihong Qiu ⋅ Zi Huang

Abstract

Recent diffusion large language models (dLLMs) have demonstrated both effectiveness and efficiency in reasoning via a block-based semi-autoregressive generation paradigm. Despite their progress, the fixed-size block generations remain a critical bottleneck for effective and coherent reasoning. (I) From a global perspective, different reasoning tasks would correspond to different optimal decoding block sizes, which makes a "one-size-fits-all" assumption ineffective. (II) Even within a single reasoning task, the rigid block partitioning would break the logical flow and reduce reasoning coherence. Through empirical observations, we reveal that, for block-wise entropy, incorrect reasoning exhibits a fluctuating and unsteady trend between blocks, while the correctly generated tasks follow a consistent descending paradigm. Therefore, this paper proposes b1, a novel post-training framework that learns dynamic-size reasoning blocks via a Monotonic Entropy Descent objective with reinforcement learning to enhance reasoning coherence. b1 integrates seamlessly as a plug-and-play module with existing dLLM's post-training algorithms. Extensive experiments across various reasoning benchmarks showcase b1's consistent improvement over fixed-size block baselines. Our code has been provided.