Skip to yearly menu bar Skip to main content


Poster

SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching

Yongmin Lee · Hye Won Chung


Abstract:

Dataset distillation aims to synthesize a small number of images per class (IPC) from a large dataset to approximate full dataset training with minimal performance loss. While effective in very small IPC ranges, many distillation methods become less effective, even underperforming random sample selection, as IPC increases. We investigate this by examining state-of-the-art trajectory-matching based distillation methods at various IPC scales, finding that their reduced efficacy at larger IPCs is partially attributed to a focus on easy dataset features, not incorporating complex patterns of the real dataset even with the increased IPC. To address this, we introduce SelMatch, a novel distillation method that effectively scales with IPC. SelMatch uses selection-based initialization and partial updates through trajectory matching to manage the synthetic dataset's desired difficulty level tailored to IPC scales. Tested on CIFAR-10/100 and TinyImageNet, SelMatch outperforms leading selection-only and distillation-only methods across 5\% to 30\% subset ratios.

Live content is unavailable. Log in and register to view live content