Utility Boundary of Dataset Distillation: Scaling and Coverage Laws
Zhengquan Luo ⋅ Zhiqiang Xu
Abstract
Dataset distillation (DD) aims to replace a full training set with a tiny synthetic one, yet current theories neither explain why heterogeneous matching objectives (gradient, distribution, trajectory) work nor provide a quantitative boundary for robustness under configuration changes (optimizer, architecture, augmentation). We propose configuration-dynamics-error (CDE) analysis for a broad class of matching-based DD methods, which provides a unified generalization framework that treats the training configuration as an update operator inducing optimization dynamics, and that measures distillation robustness by the test-risk gap between models trained on distilled versus full data. Within this framework, all gradient, distribution, and trajectory matchings are shown to reduce the same dynamics-induced risk gap, explaining why these heterogeneous objectives can work. CDE yields two predictive laws. First, within a fixed configuration, the gap decays as $\mathcal{O}(k^{-1/2})$ with the distilled set size $k$ until the configuration-dependent floor, which can explain the ubiquitous IPC saturation and indicate when improving the floor dominates enlarging $k$. Second, we formalize a utility boundary via an order-tight coverage law: the required $k$ grows linearly with the configuration diversity that can be captured by the covering-number complexity. Experiments with representative DD methods and configuration changes exhibit predictive behaviors consistent with our laws.
Successful Page Load