Timezone: »
Coresets for Classification – Simplified and Strengthened
Tung Mai · Anup Rao · Cameron Musco
Sat Jul 24 12:37 PM -- 12:42 PM (PDT) @
We give relative error coresets for training linear classifiers with a broad class of loss functions, including the logistic loss and hinge loss. Our construction achieves $(1\pm \epsilon)$ relative error with $\tilde O(d \cdot \mu_y(X)^2/\epsilon^2)$ points, where $\mu_y(X)$ is a natural complexity measure of the data matrix $X \in \R^{n \times d}$ and label vector $y \in \{-1,1\}^n$, introduced in \cite{munteanu2018coresets}. Our result is based on subsampling data points with probabilities proportional to their \emph{$\ell_1$ Lewis weights}. It significantly improves on existing theoretical bounds and performs well in practice, outperforming uniform subsampling along with other importance sampling methods. Our sampling distribution does not depend on the labels, so can be used for active learning. It also does not depend on the specific loss function, so a single coreset can be used in multiple training scenarios.
Author Information
Tung Mai (Adobe Research)
Anup Rao (Adobe Research)
Cameron Musco (UMass)
More from the Same Authors
-
2021 : Coresets for Classification – Simplified and Strengthened »
Anup Rao · Tung Mai · Cameron Musco -
2022 Poster: One-Pass Algorithms for MAP Inference of Nonsymmetric Determinantal Point Processes »
Aravind Reddy · Ryan A. Rossi · Zhao Song · Anup Rao · Tung Mai · Nedim Lipka · Gang Wu · Eunyee Koh · Nesreen K Ahmed -
2022 Poster: Online Balanced Experimental Design »
David Arbour · Drew Dimmery · Tung Mai · Anup Rao -
2022 Spotlight: Online Balanced Experimental Design »
David Arbour · Drew Dimmery · Tung Mai · Anup Rao -
2022 Spotlight: One-Pass Algorithms for MAP Inference of Nonsymmetric Determinantal Point Processes »
Aravind Reddy · Ryan A. Rossi · Zhao Song · Anup Rao · Tung Mai · Nedim Lipka · Gang Wu · Eunyee Koh · Nesreen K Ahmed -
2021 Poster: Asymptotics of Ridge Regression in Convolutional Models »
Mojtaba Sahraee-Ardakan · Tung Mai · Anup Rao · Ryan A. Rossi · Sundeep Rangan · Alyson Fletcher -
2021 Spotlight: Asymptotics of Ridge Regression in Convolutional Models »
Mojtaba Sahraee-Ardakan · Tung Mai · Anup Rao · Ryan A. Rossi · Sundeep Rangan · Alyson Fletcher -
2021 Poster: Faster Kernel Matrix Algebra via Density Estimation »
Arturs Backurs · Piotr Indyk · Cameron Musco · Tal Wagner -
2021 Spotlight: Faster Kernel Matrix Algebra via Density Estimation »
Arturs Backurs · Piotr Indyk · Cameron Musco · Tal Wagner -
2021 Poster: Fundamental Tradeoffs in Distributionally Adversarial Training »
Mohammad Mehrabi · Adel Javanmard · Ryan A. Rossi · Anup Rao · Tung Mai -
2021 Spotlight: Fundamental Tradeoffs in Distributionally Adversarial Training »
Mohammad Mehrabi · Adel Javanmard · Ryan A. Rossi · Anup Rao · Tung Mai -
2020 Poster: Efficient Intervention Design for Causal Discovery with Latents »
Raghavendra Addanki · Shiva Kasiviswanathan · Andrew McGregor · Cameron Musco