Timezone: »
In high dimensions, most machine learning methods are brittle to even a small fraction of structured outliers. To address this, we introduce a new meta-algorithm that can take in a base learner such as least squares or stochastic gradient descent, and harden the learner to be resistant to outliers. Our method, Sever, possesses strong theoretical guarantees yet is also highly scalable -- beyond running the base learner itself, it only requires computing the top singular vector of a certain n×d matrix. We apply Sever on a drug design dataset and a spam classification dataset, and find that in both cases it has substantially greater robustness than several baselines. On the spam dataset, with 1% corruptions, we achieved 7.4% test error, compared to 13.4%−20.5% for the baselines, and 3% error on the uncorrupted dataset. Similarly, on the drug design dataset, with 10% corruptions, we achieved 1.42 mean-squared error test error, compared to 1.51-2.33 for the baselines, and 1.23 error on the uncorrupted dataset.
Author Information
Ilias Diakonikolas (USC)
Gautam Kamath (MIT)
Daniel Kane (UCSD)
Jerry Li (Microsoft Research)
Jacob Steinhardt (University of California, Berkeley)
Alistair Stewart (University of Southern California)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Sever: A Robust Meta-Algorithm for Stochastic Optimization »
Wed. Jun 12th 11:25 -- 11:30 PM Room Seaside Ballroom
More from the Same Authors
-
2021 : Enabling Fast Differentially Private SGD via Just-in-Time Compilation and Vectorization »
Pranav Subramani · Nicholas Vadivelu · Gautam Kamath -
2021 : Remember What You Want to Forget: Algorithms for Machine Unlearning »
Ayush Sekhari · Ayush Sekhari · Jayadev Acharya · Gautam Kamath · Ananda Theertha Suresh -
2021 : The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection »
Shubhankar Mohapatra · Shubhankar Mohapatra · Sajin Sasy · Gautam Kamath · Xi He · Om Dipakbhai Thakkar -
2021 : Unbiased Statistical Estimation and Valid Confidence Sets Under Differential Privacy »
Christian Covington · Xi He · James Honaker · Gautam Kamath -
2021 : Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data »
Gautam Kamath · Xingtu Liu · Huanyu Zhang -
2023 Social: Black in AI »
Black in AI Events · Kalesha Bullard · Stacy Fay Hobson · Gautam Kamath -
2023 Poster: Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression under Gaussian Marginals »
Ilias Diakonikolas · Daniel Kane · Lisheng Ren -
2023 Poster: Automatically Auditing Large Language Models via Discrete Optimization »
Erik Jones · Anca Dragan · Aditi Raghunathan · Jacob Steinhardt -
2023 Poster: Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations »
Yongyi Yang · Jacob Steinhardt · Wei Hu -
2023 Poster: Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning Attacks »
Yiwei Lu · Gautam Kamath · Yaoliang Yu -
2023 Poster: Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA »
Ilias Diakonikolas · Daniel Kane · Ankit Pensia · Thanasis Pittas -
2022 Workshop: Updatable Machine Learning »
Ayush Sekhari · Gautam Kamath · Jayadev Acharya -
2022 Workshop: Theory and Practice of Differential Privacy »
Gautam Kamath · Audra McMillan -
2022 Poster: Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data »
Gautam Kamath · Xingtu Liu · Huanyu Zhang -
2022 Poster: Streaming Algorithms for High-Dimensional Robust Statistics »
Ilias Diakonikolas · Daniel Kane · Ankit Pensia · Thanasis Pittas -
2022 Oral: Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data »
Gautam Kamath · Xingtu Liu · Huanyu Zhang -
2022 Spotlight: Streaming Algorithms for High-Dimensional Robust Statistics »
Ilias Diakonikolas · Daniel Kane · Ankit Pensia · Thanasis Pittas -
2021 Workshop: Theory and Practice of Differential Privacy »
Rachel Cummings · Gautam Kamath -
2021 : Opening Remarks »
Gautam Kamath · Rachel Cummings -
2021 Poster: PAPRIKA: Private Online False Discovery Rate Control »
Wanrong Zhang · Gautam Kamath · Rachel Cummings -
2021 Spotlight: PAPRIKA: Private Online False Discovery Rate Control »
Wanrong Zhang · Gautam Kamath · Rachel Cummings -
2020 Poster: High-dimensional Robust Mean Estimation via Gradient Descent »
Yu Cheng · Ilias Diakonikolas · Rong Ge · Mahdi Soltanolkotabi -
2020 Poster: Privately Learning Markov Random Fields »
Huanyu Zhang · Gautam Kamath · Janardhan Kulkarni · Steven Wu -
2020 Poster: Identifying Statistical Bias in Dataset Replication »
Logan Engstrom · Andrew Ilyas · Shibani Santurkar · Dimitris Tsipras · Jacob Steinhardt · Aleksander Madry -
2020 Poster: Efficiently Learning Adversarially Robust Halfspaces with Noise »
Omar Montasser · Surbhi Goel · Ilias Diakonikolas · Nati Srebro -
2019 Workshop: Workshop on the Security and Privacy of Machine Learning »
Nicolas Papernot · Florian Tramer · Bo Li · Dan Boneh · David Evans · Somesh Jha · Percy Liang · Patrick McDaniel · Jacob Steinhardt · Dawn Song -
2018 Poster: On the Limitations of First-Order Approximation in GAN Dynamics »
Jerry Li · Aleksander Madry · John Peebles · Ludwig Schmidt -
2018 Oral: On the Limitations of First-Order Approximation in GAN Dynamics »
Jerry Li · Aleksander Madry · John Peebles · Ludwig Schmidt -
2018 Poster: INSPECTRE: Privately Estimating the Unseen »
Jayadev Acharya · Gautam Kamath · Ziteng Sun · Huanyu Zhang -
2018 Oral: INSPECTRE: Privately Estimating the Unseen »
Jayadev Acharya · Gautam Kamath · Ziteng Sun · Huanyu Zhang -
2017 Poster: Priv’IT: Private and Sample Efficient Identity Testing »
Bryan Cai · Constantinos Daskalakis · Gautam Kamath -
2017 Poster: Being Robust (in High Dimensions) Can Be Practical »
Ilias Diakonikolas · Gautam Kamath · Daniel Kane · Jerry Li · Ankur Moitra · Alistair Stewart -
2017 Poster: ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning »
Hantian Zhang · Jerry Li · Kaan Kara · Dan Alistarh · Ji Liu · Ce Zhang -
2017 Talk: ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning »
Hantian Zhang · Jerry Li · Kaan Kara · Dan Alistarh · Ji Liu · Ce Zhang -
2017 Talk: Priv’IT: Private and Sample Efficient Identity Testing »
Bryan Cai · Constantinos Daskalakis · Gautam Kamath -
2017 Talk: Being Robust (in High Dimensions) Can Be Practical »
Ilias Diakonikolas · Gautam Kamath · Daniel Kane · Jerry Li · Ankur Moitra · Alistair Stewart