Provable Bounds for the Learnability of Sample-Compressible Families from Noisy Samples
Arefe Boushehrian ⋅ Amir Najafi
Abstract
Learning distribution families over $\mathbb{R}^d$ is a fundamental problem in unsupervised learning and statistics. A central question in this setting is whether a given family of distributions possesses sufficient structure to be (at least) information-theoretically learnable and, if so, to characterize its sample complexity. In 2018, Ashtiani et al. (2018) reformulated sample compressibility as a structural property of distribution classes, proving that it guarantees PAC-learnability. This discovery subsequently enabled a series of recent advancements in deriving nearly tight sample complexity bounds for various high-dimensional open problems. It has been further conjectured that the converse also holds: every learnable class admits a sample compression scheme, making the two notions to be equivalent. In this work, we establish that sample compressible families remain learnable even from perturbed samples, subject to a set of minimax-necessary and sufficient conditions. In particular, we assume samples are corrupted by an additive independent noise model, and theoretically derive sample complexity bounds for general sample compressible classes in arbitrary dimensions with respect to both $\ell_2$-norm and total variation distance.
Successful Page Load