Timezone: »

Dataset Condensation via Efficient Synthetic-Data Parameterization
Jang-Hyun Kim · Jinuk Kim · Seong Joon Oh · Sangdoo Yun · Hwanjun Song · Joonhyun Jeong · Jung-Woo Ha · Hyun Oh Song

Thu Jul 21 03:00 PM -- 05:00 PM (PDT) @ Hall E #225

The great success of machine learning with massive amounts of data comes at a price of huge computation costs and storage for training and tuning. Recent studies on dataset condensation attempt to reduce the dependence on such massive data by synthesizing a compact training dataset. However, the existing approaches have fundamental limitations in optimization due to the limited representability of synthetic datasets without considering any data regularity characteristics. To this end, we propose a novel condensation framework that generates multiple synthetic data with a limited storage budget via efficient parameterization considering data regularity. We further analyze the shortcomings of the existing gradient matching-based condensation methods and develop an effective optimization technique for improving the condensation of training data information. We propose a unified algorithm that drastically improves the quality of condensed data against the current state-of-the-art on CIFAR-10, ImageNet, and Speech Commands.

Author Information

Jang-Hyun Kim (Seoul National University)
Jinuk Kim (Seoul National University)
Seong Joon Oh (AI Lab, Naver)
Sangdoo Yun ( Clova AI Research, NAVER Corp.)
Hwanjun Song (NAVER AI Lab)
Joonhyun Jeong (Clova Image Vision, NAVER Corp.)
Jung-Woo Ha (NAVER AI Lab)
Jung-Woo Ha

Jung-Woo Ha got his BS and PhD degrees in computer science from Seoul National University in 2004 and 2015. He got the 2014 Fall semester outstanding PhD dissertation award from Computer Science Dept. of Seoul National University. He worked as a research scientist and tech lead at NAVER LABS and research head of NAVER CLOVA. Currently, he works as the head of NAVER AI Lab in NAVER Cloud. He has contributed to the AI research community as Datasets and Benchmarks Co-chair for NeurIPS and Social Co-chair for ICML 2023 and NeurIPS 2022. Also, he has joined a senior technical program committee member, such as, Area chair for NeurIPS 2023 and 2022, Area chair for ICML 2023, and Senior area chair for COLING. His research interests include large language models, generative models, multimodal representation learning and their practical applications for real-world problems. In particular, he has mainly focused on practical task definition and evaluation protocol for continual learning in various domains.

Hyun Oh Song (Seoul National University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors