Timezone: »
Oral
Sharper Bounds for $\ell_p$ Sensitivity Sampling
David Woodruff · Taisuke Yasuda
In large scale machine learning, *random sampling* is a popular way to approximate datasets by a small representative subset of examples. In particular, *sensitivity sampling* is an intensely studied technique which provides provable guarantees on the quality of approximation, while reducing the number of examples to the product of the *VC dimension* $d$ and the *total sensitivity* $\mathfrak{S}$ in remarkably general settings. However, guarantees going beyond this general bound of $\mathfrak{S} d$ are known in perhaps only one setting, for *$\ell_2$ subspace embeddings*, despite intense study of sensitivity sampling in prior work. In this work, we show the first bounds for sensitivity sampling for $\ell_p$ subspace embeddings for $p\neq 2$ that improve over the general $\mathfrak{S} d$ bound, achieving a bound of roughly $\mathfrak{S}^{2/p}$ for $1\leq p2$ and $\mathfrak{S}^{2-2/p}$ for $2<p<\infty$. For $1\leq p<2$, we show that this bound is tight, in the sense that there exist matrices for which $\mathfrak{S}^{2/p}$ samples is necessary. Furthermore, our techniques yield further new results in the study of sampling algorithms, showing that the *root leverage score sampling* algorithm achieves a bound of roughly $d$ for $1\leq p<2$, and that a combination of leverage score and sensitivity sampling achieves an improved bound of roughly $d^{2/p}\mathfrak{S}^{2-4/p}$ for $2<p<\infty$. Our sensitivity sampling results yield the best known sample complexity for a wide class of structured matrices that have small $\ell_p$ sensitivity.
Author Information
David Woodruff (Carnegie Mellon University)
Taisuke Yasuda (School of Computer Science, Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2023 Poster: Sharper Bounds for $\ell_p$ Sensitivity Sampling »
Thu. Jul 27th through Fri the 28th Room Exhibit Hall 1 #336
More from the Same Authors
-
2023 : Sequential Attention for Feature Selection »
Taisuke Yasuda · Mohammad Hossein Bateni · Lin Chen · Matthew Fahrbach · Thomas Fu · Vahab Mirrokni -
2023 : Tensor Proxies for Efficient Feature Cross Search »
Taisuke Yasuda · Mohammad Hossein Bateni · Lin Chen · Matthew Fahrbach · Thomas Fu -
2023 Poster: Improved Algorithms for White-Box Adversarial Streams »
Ying Feng · David Woodruff -
2023 Poster: Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization »
Ameya Velingker · Maximilian Vötsch · David Woodruff · Samson Zhou -
2022 Poster: Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra »
Nadiia Chepurko · Kenneth Clarkson · Lior Horesh · Honghao Lin · David Woodruff -
2022 Spotlight: Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra »
Nadiia Chepurko · Kenneth Clarkson · Lior Horesh · Honghao Lin · David Woodruff -
2019 Poster: Dimensionality Reduction for Tukey Regression »
Kenneth Clarkson · Ruosong Wang · David Woodruff -
2019 Poster: Faster Algorithms for Binary Matrix Factorization »
Ravi Kumar · Rina Panigrahy · Ali Rahimi · David Woodruff -
2019 Poster: Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel $k$-means Clustering »
Taisuke Yasuda · David Woodruff · Manuel Fernandez -
2019 Oral: Faster Algorithms for Binary Matrix Factorization »
Ravi Kumar · Rina Panigrahy · Ali Rahimi · David Woodruff -
2019 Oral: Dimensionality Reduction for Tukey Regression »
Kenneth Clarkson · Ruosong Wang · David Woodruff -
2019 Oral: Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel $k$-means Clustering »
Taisuke Yasuda · David Woodruff · Manuel Fernandez -
2018 Poster: Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order »
Vladimir Braverman · Stephen Chestnut · Robert Krauthgamer · Yi Li · David Woodruff · Lin Yang -
2018 Oral: Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order »
Vladimir Braverman · Stephen Chestnut · Robert Krauthgamer · Yi Li · David Woodruff · Lin Yang -
2018 Poster: Leveraging Well-Conditioned Bases: Streaming and Distributed Summaries in Minkowski $p$-Norms »
Charlie Dickens · Graham Cormode · David Woodruff -
2018 Oral: Leveraging Well-Conditioned Bases: Streaming and Distributed Summaries in Minkowski $p$-Norms »
Charlie Dickens · Graham Cormode · David Woodruff