CANDI: Hybrid Discrete-Continuous Diffusion Models
Abstract
While continuous diffusion has shown remarkable success in continuous domains such as image generation, its direct application to discrete data has underperformed pure discrete formulations. To understand this gap, we introduce token identifiability, an analytical framework characterizing how Gaussian noise corrupts discrete data through two mechanisms: discrete identity corruption and continuous rank degradation. We reveal that these mechanisms scale differently with vocabulary size, creating a temporal dissonance that forces a tradeoff between learning continuous geometry and discrete structure. To address this, we propose CANDI (Continuous ANd DIscrete diffusion), a hybrid framework that decouples discrete and continuous corruption, enabling simultaneous learning of both. This unlocks the benefits of continuous diffusion for discrete spaces: on controlled generation, CANDI enables classifier-based guidance with off-the-shelf classifiers through simple gradient addition; on text generation, CANDI outperforms masked diffusion at low NFE, demonstrating the value of learning continuous gradients for discrete spaces.