Timezone: »
Sparsity in Deep Neural Networks (DNNs) is studied extensively with the focus of maximizing prediction accuracy given an overall parameter budget. Existing methods rely on uniform or heuristic non-uniform sparsity budgets which have sub-optimal layer-wise parameter allocation resulting in a) lower prediction accuracy or b) higher inference cost (FLOPs). This work proposes Soft Threshold Reparameterization (STR), a novel use of the soft-threshold operator on DNN weights. STR smoothly induces sparsity while learning pruning thresholds thereby obtaining a non-uniform sparsity budget. Our method achieves state-of-the-art accuracy for unstructured sparsity in CNNs (ResNet50 and MobileNetV1 on ImageNet-1K), and, additionally, learns non-uniform budgets that empirically reduce the FLOPs by up to 50%. Notably, STR boosts the accuracy over existing results by up to 10% in the ultra sparse (99%) regime and can also be used to induce low-rank (structured sparsity) in RNNs. In short, STR is a simple mechanism which learns effective sparsity budgets that contrast with popular heuristics. Code, pretrained models and sparsity budgets are at https://github.com/RAIVNLab/STR.
Author Information
Aditya Kusupati (University of Washington)
Vivek Ramanujan (Allen Institute for Artificial Intelligence)
Raghav Somani (University of Washington)
I am broadly interested in the aspects of Large-Scale Optimization and Probability theory that arise in fundamental Machine Learning.
Mitchell Wortsman (University of Washington)
Prateek Jain (Microsoft Research)
Sham Kakade (University of Washington)
Sham Kakade is a Gordon McKay Professor of Computer Science and Statistics at Harvard University and a co-director of the recently announced Kempner Institute. He works on the mathematical foundations of machine learning and AI. Sham's thesis helped in laying the statistical foundations of reinforcement learning. With his collaborators, his additional contributions include: one of the first provably efficient policy search methods, Conservative Policy Iteration, for reinforcement learning; developing the mathematical foundations for the widely used linear bandit models and the Gaussian process bandit models; the tensor and spectral methodologies for provable estimation of latent variable models; the first sharp analysis of the perturbed gradient descent algorithm, along with the design and analysis of numerous other convex and non-convex algorithms. He is the recipient of the ICML Test of Time Award (2020), the IBM Pat Goldberg best paper award (in 2007), INFORMS Revenue Management and Pricing Prize (2014). He has been program chair for COLT 2011. Sham was an undergraduate at Caltech, where he studied physics and worked under the guidance of John Preskill in quantum computing. He then completed his Ph.D. in computational neuroscience at the Gatsby Unit at University College London, under the supervision of Peter Dayan. He was a postdoc at the Dept. of Computer Science, University of Pennsylvania , where he broadened his studies to include computational game theory and economics from the guidance of Michael Kearns. Sham has been a Principal Research Scientist at Microsoft Research, New England, an associate professor at the Department of Statistics, Wharton, UPenn, and an assistant professor at the Toyota Technological Institute at Chicago.
Ali Farhadi (University of Washington, Allen Institue for AI)
More from the Same Authors
-
2021 : Disrupting Model Training with Adversarial Shortcuts »
Ivan Evtimov · Ian Covert · Aditya Kusupati · Tadayoshi Kohno -
2021 : A Short Note on the Relationship of Information Gain and Eluder Dimension »
Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Poster: Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time »
Mitchell Wortsman · Gabriel Ilharco · Samir Gadre · Becca Roelofs · Raphael Gontijo Lopes · Ari Morcos · Hongseok Namkoong · Ali Farhadi · Yair Carmon · Simon Kornblith · Ludwig Schmidt -
2022 Spotlight: Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time »
Mitchell Wortsman · Gabriel Ilharco · Samir Gadre · Becca Roelofs · Raphael Gontijo Lopes · Ari Morcos · Hongseok Namkoong · Ali Farhadi · Yair Carmon · Simon Kornblith · Ludwig Schmidt -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2021 Poster: Defense against backdoor attacks via robust covariance estimation »
Jonathan Hayase · Weihao Kong · Raghav Somani · Sewoong Oh -
2021 Spotlight: Defense against backdoor attacks via robust covariance estimation »
Jonathan Hayase · Weihao Kong · Raghav Somani · Sewoong Oh -
2021 Poster: How Important is the Train-Validation Split in Meta-Learning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong -
2021 Spotlight: How Important is the Train-Validation Split in Meta-Learning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong -
2021 Poster: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Poster: Instabilities of Offline RL with Pre-Trained Neural Representation »
Ruosong Wang · Yifan Wu · Ruslan Salakhutdinov · Sham Kakade -
2021 Spotlight: Instabilities of Offline RL with Pre-Trained Neural Representation »
Ruosong Wang · Yifan Wu · Ruslan Salakhutdinov · Sham Kakade -
2021 Oral: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Poster: Learning Neural Network Subspaces »
Mitchell Wortsman · Maxwell Horton · Carlos Guestrin · Ali Farhadi · Mohammad Rastegari -
2021 Spotlight: Learning Neural Network Subspaces »
Mitchell Wortsman · Maxwell Horton · Carlos Guestrin · Ali Farhadi · Mohammad Rastegari -
2020 : QA for invited talk 8 Kakade »
Sham Kakade -
2020 : Invited talk 8 Kakade »
Sham Kakade -
2020 : Discussion Panel »
Krzysztof Dembczynski · Prateek Jain · Alina Beygelzimer · Inderjit Dhillon · Anna Choromanska · Maryam Majzoubi · Yashoteja Prabhu · John Langford -
2020 : Speaker Panel »
Csaba Szepesvari · Martha White · Sham Kakade · Gergely Neu · Shipra Agrawal · Akshay Krishnamurthy -
2020 : Exploration, Policy Gradient Methods, and the Deadly Triad - Sham Kakade »
Sham Kakade -
2020 Poster: Calibration, Entropy Rates, and Memory in Language Models »
Mark Braverman · Xinyi Chen · Sham Kakade · Karthik Narasimhan · Cyril Zhang · Yi Zhang -
2020 Poster: The Implicit and Explicit Regularization Effects of Dropout »
Colin Wei · Sham Kakade · Tengyu Ma -
2020 Poster: Provable Representation Learning for Imitation Learning via Bi-level Optimization »
Sanjeev Arora · Simon Du · Sham Kakade · Yuping Luo · Nikunj Umesh Saunshi -
2020 Poster: Optimization and Analysis of the pAp@k Metric for Recommender Systems »
Gaurush Hiranandani · Warut Vijitbenjaronk · Sanmi Koyejo · Prateek Jain -
2020 Poster: Meta-learning for Mixed Linear Regression »
Weihao Kong · Raghav Somani · Zhao Song · Sham Kakade · Sewoong Oh -
2020 Poster: DROCC: Deep Robust One-Class Classification »
Sachin Goyal · Aditi Raghunathan · Moksh Jain · Harsha Vardhan Simhadri · Prateek Jain -
2020 Test Of Time: Test of Time: Gaussian Process Optimization in the Bandit Settings: No Regret and Experimental Design »
Niranjan Srinivas · Andreas Krause · Sham Kakade · Matthias Seeger -
2019 : Keynote by Sham Kakade: Prediction, Learning, and Memory »
Sham Kakade -
2019 Poster: Online Control with Adversarial Disturbances »
Naman Agarwal · Brian Bullins · Elad Hazan · Sham Kakade · Karan Singh -
2019 Oral: Online Control with Adversarial Disturbances »
Naman Agarwal · Brian Bullins · Elad Hazan · Sham Kakade · Karan Singh -
2019 Poster: SGD without Replacement: Sharper Rates for General Smooth Convex Functions »
Dheeraj Nagaraj · Prateek Jain · Praneeth Netrapalli -
2019 Poster: Provably Efficient Maximum Entropy Exploration »
Elad Hazan · Sham Kakade · Karan Singh · Abby Van Soest -
2019 Oral: Provably Efficient Maximum Entropy Exploration »
Elad Hazan · Sham Kakade · Karan Singh · Abby Van Soest -
2019 Oral: SGD without Replacement: Sharper Rates for General Smooth Convex Functions »
Dheeraj Nagaraj · Prateek Jain · Praneeth Netrapalli -
2019 Poster: Online Meta-Learning »
Chelsea Finn · Aravind Rajeswaran · Sham Kakade · Sergey Levine -
2019 Poster: Maximum Likelihood Estimation for Learning Populations of Parameters »
Ramya Korlakai Vinayak · Weihao Kong · Gregory Valiant · Sham Kakade -
2019 Oral: Maximum Likelihood Estimation for Learning Populations of Parameters »
Ramya Korlakai Vinayak · Weihao Kong · Gregory Valiant · Sham Kakade -
2019 Oral: Online Meta-Learning »
Chelsea Finn · Aravind Rajeswaran · Sham Kakade · Sergey Levine -
2018 Poster: Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator »
Maryam Fazel · Rong Ge · Sham Kakade · Mehran Mesbahi -
2018 Oral: Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator »
Maryam Fazel · Rong Ge · Sham Kakade · Mehran Mesbahi -
2018 Poster: Differentially Private Matrix Completion Revisited »
Prateek Jain · Om Dipakbhai Thakkar · Abhradeep Thakurta -
2018 Oral: Differentially Private Matrix Completion Revisited »
Prateek Jain · Om Dipakbhai Thakkar · Abhradeep Thakurta -
2017 Workshop: ML on a budget: IoT, Mobile and other tiny-ML applications »
Manik Varma · Venkatesh Saligrama · Prateek Jain -
2017 Workshop: Principled Approaches to Deep Learning »
Andrzej Pronobis · Robert Gens · Sham Kakade · Pedro Domingos -
2017 Poster: ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices »
Chirag Gupta · ARUN SUGGALA · Ankit Goyal · Saurabh Goyal · Ashish Kumar · Bhargavi Paranjape · Harsha Vardhan Simhadri · Raghavendra Udupa · Manik Varma · Prateek Jain -
2017 Poster: How to Escape Saddle Points Efficiently »
Chi Jin · Rong Ge · Praneeth Netrapalli · Sham Kakade · Michael Jordan -
2017 Talk: How to Escape Saddle Points Efficiently »
Chi Jin · Rong Ge · Praneeth Netrapalli · Sham Kakade · Michael Jordan -
2017 Talk: ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices »
Chirag Gupta · ARUN SUGGALA · Ankit Goyal · Saurabh Goyal · Ashish Kumar · Bhargavi Paranjape · Harsha Vardhan Simhadri · Raghavendra Udupa · Manik Varma · Prateek Jain -
2017 Poster: Recovery Guarantees for One-hidden-layer Neural Networks »
Kai Zhong · Zhao Song · Prateek Jain · Peter Bartlett · Inderjit Dhillon -
2017 Poster: Nearly Optimal Robust Matrix Completion »
Yeshwanth Cherapanamjeri · Prateek Jain · Kartik Gupta -
2017 Poster: Active Heteroscedastic Regression »
Kamalika Chaudhuri · Prateek Jain · Nagarajan Natarajan -
2017 Talk: Active Heteroscedastic Regression »
Kamalika Chaudhuri · Prateek Jain · Nagarajan Natarajan -
2017 Talk: Nearly Optimal Robust Matrix Completion »
Yeshwanth Cherapanamjeri · Prateek Jain · Kartik Gupta -
2017 Talk: Recovery Guarantees for One-hidden-layer Neural Networks »
Kai Zhong · Zhao Song · Prateek Jain · Peter Bartlett · Inderjit Dhillon