Poster
in
Workshop: RLxF: RL from World Feedback Fri, Jul 10, 2026 • 12:00 AM – 1:00 AM PDT

Single-Step Initialization for Exploratory Parallel Rollouts in Diffusion LLMs

Dongjae Jeon ⋅ Bumjun Kim ⋅ Mingyu Kim ⋅ Albert No

Project Page

Abstract

We propose training-free parallel decoding for rollout generation in diffusion large language model (dLLM) policy optimization, reducing rollout cost without auxiliary models or policy modification. We find, however, that confidence-based decoding suffers from delayed branching, and parallel decoding largely inherits this characteristic. Rollouts agree on both unmasked tokens and positions for much of generation, leading to a lack of exploration that weakens the group-relative learning signal. We address this with a minimal initialization step in which each rollout independently unmasks one uniformly random position after which the original sampler resumes unchanged. The intervention is drop-in compatible with any sampling strategies. Combined with Fast-dLLM on LLaDA-8B-Instruct, it improves rollout diversity and yields stronger downstream RL performance on GSM8K and MATH-500.