Timezone: »
Even though the goal of pruning is often to reduce the computational resources consumed during training or inference, it comes as no surprise to theoreticians or practitioners that pruning also improves generalization. In this work, we empirically study pruning's effect on generalization, focusing on two state-of-the-art pruning algorithms: weight rewinding and learning-rate rewinding. However, each pruning algorithm is actually an aggregation of many design choices: a weight scoring heuristic, a pruning schedule, a learning rate schedule, among other factors, each of which might contribute to generalization improvement in different ways. We thus ablate each design choice to determine whether it is responsible for pruning's effect on generalization. We find that each individual contribution is limited compared to the generalization improvement achieved with the pruning algorithm in its entirety. Our results also highlight similarities and differences between the effects on generalization caused by pruning and model-width scaling.
Author Information
Tian Jin (MIT)
Gintare Karolina Dziugaite (Element AI)
Michael Carbin (MIT)
More from the Same Authors
-
2021 : Studying the Consistency and Composability of Lottery Ticket Pruning Masks »
Rajiv Movva · Michael Carbin · Jonathan Frankle -
2021 : Towards a Unified Information-Theoretic Framework for Generalization »
Mahdi Haghifam · Gintare Karolina Dziugaite · Shay Moran -
2021 : On the Generalization Improvement from Neural Network Pruning »
Tian Jin · Gintare Karolina Dziugaite · Michael Carbin -
2022 : Pre-Training on a Data Diet: Identifying Sufficient Examples for Early Training »
Mansheej Paul · Brett Larsen · Surya Ganguli · Jonathan Frankle · Gintare Karolina Dziugaite -
2023 : Flat minima can fail to transfer to downstream tasks »
Deepansha Singh · Ekansh Sharma · Daniel Roy · Gintare Karolina Dziugaite -
2023 : Distributions for Compositionally Differentiating Parametric Discontinuities »
Jesse Michel · Kevin Mu · Xuanda Yang · Sai Praveen Bangaru · Elias Rojas Collins · Gilbert Bernstein · Jonathan Ragan-Kelley · Michael Carbin · Tzu-Mao Li -
2023 : Can LLMs Generate Random Numbers? Evaluating LLM Sampling in Controlled Domains »
Aspen Hopkins · Alex Renda · Michael Carbin -
2023 : Invited talk: Lessons Learned from Studying PAC-Bayes and Generalization »
Gintare Karolina Dziugaite -
2022 : Finding Structured Winning Tickets with Early Pruning »
Udbhav Bamba · Devin Kwok · Gintare Karolina Dziugaite · David Rolnick -
2021 Poster: On the Predictability of Pruning Across Scales »
Jonathan Rosenfeld · Jonathan Frankle · Michael Carbin · Nir Shavit -
2021 Spotlight: On the Predictability of Pruning Across Scales »
Jonathan Rosenfeld · Jonathan Frankle · Michael Carbin · Nir Shavit -
2020 Poster: Generalization via Derandomization »
Jeffrey Negrea · Gintare Karolina Dziugaite · Daniel Roy -
2020 Poster: Linear Mode Connectivity and the Lottery Ticket Hypothesis »
Jonathan Frankle · Gintare Karolina Dziugaite · Daniel Roy · Michael Carbin -
2019 Poster: Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks »
Charith Mendis · Alex Renda · Dr.Saman Amarasinghe · Michael Carbin -
2019 Oral: Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks »
Charith Mendis · Alex Renda · Dr.Saman Amarasinghe · Michael Carbin -
2018 Poster: Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors »
Gintare Karolina Dziugaite · Daniel Roy -
2018 Oral: Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors »
Gintare Karolina Dziugaite · Daniel Roy