Toward Dataset Distillation for Regression Problems
Abstract
Dataset distillation is a growing technique that compresses large datasets into smaller synthetic datasets while preserving learning characteristics. However, it remains under-studied for regression problems. This paper presents a theoretical framework for regression dataset distillation using bilevel optimization, where inner loops optimize model parameters on distilled data, while the outer loops refine the dataset itself. For regularized linear regression, we derive closed-form solutions and show approximation guarantees when the number of features is greater than the size of the distilled dataset, using Polyak-Łojasiewicz properties to yield linear rates. Numerical experiments support our predictions with high determination, validating our theory while reducing dataset size by an order of magnitude.