ProtoVAR: Efficient Dataset Distillation via Prototype-Guided Visual Autoregressive Modeling
Abstract
Recent advances in generative distillation have shown strong potential in constructing high quality surrogate datasets within a fraction of the time required by optimization-based approaches. However, most existing generative solutions rely on diffusion models, which suffer from two limitations. (i) Indirect matching objectives. Their sequential denoising process makes it difficult to directly match representative prototypes. (ii) Target-agnostic generation. The generation process is often decoupled from the target task, causing the synthesized samples to drift from the desired distribution. Building on this insight, We propose ProtoVAR, a prototype-guided visual autoregressive framework. Instead of relying on latent space, ProtoVAR uses the coarse-to-fine next-scale prediction of Visual AutoRegressive (VAR) modeling to maintain semantic consistency during generation. By injecting multi-scale class prototypes, ProtoVAR enforces clear representativeness constraints while preserving diversity. A pool-based selector further distills the prototype-guided outputs into a compact, task-aligned surrogate dataset. Extensive experiments show that ProtoVAR achieves state-of-the-art performance with comparable or lower computational cost than diffusion-based distillation.