Efficient Multi-modal Dataset Distillation via Analytic Parameter Matching
Deyu Bo ⋅ Xinchao Wang
Abstract
Multi-modal dataset distillation (MDD) seeks to compress large-scale multi-modal datasets into a compact set of synthetic pairs. Existing methods employ a dual-trajectory matching framework to align the teacher and student models within each modality. While effective, this paradigm incurs non-negligible memory and computational overhead due to the checkpoint storage and bi-level optimization over synthetic data. To address these limitations, we propose analytic parameter matching (APM), which theoretically derives the analytic parameters of modal projectors to replace the inner-loop optimization, and then aligns the analytic projector parameters of teacher and student models. APM offers two key advantages: (1) it replaces checkpoint-intensive storage with only two cached matrices, significantly reducing memory consumption; and (2) it computes analytic parameters in a single forward pass, thereby avoiding costly bi-level optimization. Empirically, APM achieves up to 65$\times$ storage reduction and 9.6$\times$ faster distillation, while scaling to 1,000 synthetic pairs. Extensive experiments on image-text and audio-text benchmarks demonstrate the effectiveness of APM in cross-modal retrieval tasks, \eg, 12.8 IR@1 and 17.8 TR@1 in Flickr30k with 100 synthetic pairs. Moreover, APM exhibits notable generalization performance in cross-architecture evaluation and zero-shot classification tasks.
Successful Page Load