Timezone: »
Recent empirical works show that large deep neural networks are often highly redundant and one can find much smaller subnetworks without a significant drop of accuracy. However, most existing methods of network pruning are empirical and heuristic, leaving it open whether good subnetworks provably exist, how to find them efficiently, and if network pruning can be provably better than direct training using gradient descent. We answer these problems positively by proposing a simple greedy selection approach for finding good subnetworks, which starts from an empty network and greedily adds important neurons from the large network. This differs from the existing methods based on backward elimination, which remove redundant neurons from the large network. Theoretically, applying the greedy selection strategy on sufficiently large {pre-trained} networks guarantees to find small subnetworks with lower loss than networks directly trained with gradient descent. Our results also apply to pruning randomly weighted networks. Practically, we improve prior arts of network pruning on learning compact neural architectures on ImageNet, including ResNet, MobilenetV2/V3, and ProxylessNet. Our theory and empirical results on MobileNet suggest that we should fine-tune the pruned subnetworks to leverage the information from the large model, instead of re-training from new random initialization as suggested in \citet{liu2018rethinking}.
Author Information
Mao Ye (UT Austin)
Chengyue Gong (university of texas at austin)
Lizhen Nie (The University of Chicago)
Denny Zhou (Google Brain)
Adam Klivans (University of Texas at Austin)
Qiang Liu (UT Austin)
More from the Same Authors
-
2020 Poster: Go Wide, Then Narrow: Efficient Training of Deep Thin Networks »
Denny Zhou · Mao Ye · Chen Chen · Tianjian Meng · Mingxing Tan · Xiaodan Song · Quoc Le · Qiang Liu · Dale Schuurmans -
2020 Poster: Accountable Off-Policy Evaluation With Kernel Bellman Statistics »
Yihao Feng · Tongzheng Ren · Ziyang Tang · Qiang Liu -
2020 Poster: Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent »
Surbhi Goel · Aravind Gollakota · Zhihan Jin · Sushrut Karmalkar · Adam Klivans -
2020 Poster: A Chance-Constrained Generative Framework for Sequence Optimization »
Xianggen Liu · Qiang Liu · Sen Song · Jian Peng -
2019 Workshop: Stein’s Method for Machine Learning and Statistics »
Francois-Xavier Briol · Lester Mackey · Chris Oates · Qiang Liu · Larry Goldstein · Larry Goldstein -
2019 Poster: Improving Neural Language Modeling via Adversarial Training »
Dilin Wang · Chengyue Gong · Qiang Liu -
2019 Oral: Improving Neural Language Modeling via Adversarial Training »
Dilin Wang · Chengyue Gong · Qiang Liu -
2019 Poster: Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization »
Chengyue Gong · Jian Peng · Qiang Liu -
2019 Poster: Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models »
Dilin Wang · Qiang Liu -
2019 Oral: Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization »
Chengyue Gong · Jian Peng · Qiang Liu -
2019 Oral: Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models »
Dilin Wang · Qiang Liu -
2018 Poster: Learning to Explore via Meta-Policy Gradient »
Tianbing Xu · Qiang Liu · Liang Zhao · Jian Peng -
2018 Poster: Stein Variational Gradient Descent Without Gradient »
Jun Han · Qiang Liu -
2018 Oral: Stein Variational Gradient Descent Without Gradient »
Jun Han · Qiang Liu -
2018 Oral: Learning to Explore via Meta-Policy Gradient »
Tianbing Xu · Qiang Liu · Liang Zhao · Jian Peng -
2018 Poster: Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy »
Jiasen Yang · Qiang Liu · Vinayak A Rao · Jennifer Neville -
2018 Poster: Learning One Convolutional Layer with Overlapping Patches »
Surbhi Goel · Adam Klivans · Raghu Meka -
2018 Poster: Stein Variational Message Passing for Continuous Graphical Models »
Dilin Wang · Zhe Zeng · Qiang Liu -
2018 Oral: Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy »
Jiasen Yang · Qiang Liu · Vinayak A Rao · Jennifer Neville -
2018 Oral: Stein Variational Message Passing for Continuous Graphical Models »
Dilin Wang · Zhe Zeng · Qiang Liu -
2018 Oral: Learning One Convolutional Layer with Overlapping Patches »
Surbhi Goel · Adam Klivans · Raghu Meka -
2018 Poster: Variable Selection via Penalized Neural Network: a Drop-Out-One Loss Approach »
Mao Ye · Yan Sun -
2018 Oral: Variable Selection via Penalized Neural Network: a Drop-Out-One Loss Approach »
Mao Ye · Yan Sun -
2017 Poster: Exact MAP Inference by Avoiding Fractional Vertices »
Erik Lindgren · Alexandros Dimakis · Adam Klivans -
2017 Talk: Exact MAP Inference by Avoiding Fractional Vertices »
Erik Lindgren · Alexandros Dimakis · Adam Klivans