Exploring More to Solve More: Boosting Diversity in Text Diffusion Models via Entropy-Based Guidance
Abstract
Although diffusion models have revolutionized continuous domains like image synthesis through high quality generations and controllable guidance mechanisms, bringing this controllability to the discrete, sequential nature of text remains an open challenge. Meanwhile, current sampling strategies and guidance methods adjust token likelihoods without capturing the broader semantic landscape, leading to a suboptimal balance between fidelity and diversity. In this work, we introduce a novel training-free Semantic-Aware Kernel Entropy (SAKE) guidance method. Our method computes the order-2 Rényi entropy over a kernel Gram matrix that captures both cross-token semantic interactions and relative token positions. By linearizing this objective in the embedding space, we derive a tractable guidance signal that dynamically adjusts the sampling distribution—flattening it to encourage exploration during redundancy and sharpening it for fidelity when diverse. Empirical experiments demonstrate that our approach achieves a superior Pareto frontier between fidelity and diversity, and improves multi-sample performance on reasoning-intensive tasks, such as code and mathematics generation, compared to temperature scaling and discrete guidance baselines.