Skip to yearly menu bar Skip to main content


Oral

Sharper Bounds for p Sensitivity Sampling

David Woodruff · Taisuke Yasuda

Meeting Room 316 A-C

Abstract: In large scale machine learning, *random sampling* is a popular way to approximate datasets by a small representative subset of examples. In particular, *sensitivity sampling* is an intensely studied technique which provides provable guarantees on the quality of approximation, while reducing the number of examples to the product of the *VC dimension* d and the *total sensitivity* S in remarkably general settings. However, guarantees going beyond this general bound of Sd are known in perhaps only one setting, for *2 subspace embeddings*, despite intense study of sensitivity sampling in prior work. In this work, we show the first bounds for sensitivity sampling for p subspace embeddings for p2 that improve over the general Sd bound, achieving a bound of roughly S2/p for 1p2 and S22/p for 2<p<. For 1p<2, we show that this bound is tight, in the sense that there exist matrices for which S2/p samples is necessary. Furthermore, our techniques yield further new results in the study of sampling algorithms, showing that the *root leverage score sampling* algorithm achieves a bound of roughly d for 1p<2, and that a combination of leverage score and sensitivity sampling achieves an improved bound of roughly d2/pS24/p for 2<p<. Our sensitivity sampling results yield the best known sample complexity for a wide class of structured matrices that have small p sensitivity.

Chat is not available.