Timezone: »
We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e.g., random data order and augmentation). We find that standard vision models become stable to SGD noise in this way early in training. From then on, the outcome of optimization is determined to a linearly connected region. We use this technique to study iterative magnitude pruning (IMP), the procedure used by work on the lottery ticket hypothesis to identify subnetworks that could have trained in isolation to full accuracy. We find that these subnetworks only reach full accuracy when they are stable to SGD noise, which either occurs at initialization for small-scale settings (MNIST) or early in training for large-scale settings (ResNet-50 and Inception-v3 on ImageNet).
Author Information
Jonathan Frankle (MIT CSAIL)
Gintare Karolina Dziugaite (Element AI)
Daniel Roy (University of Toronto; Vector Institute)
Michael Carbin (MIT)
More from the Same Authors
-
2021 : Studying the Consistency and Composability of Lottery Ticket Pruning Masks »
Rajiv Movva · Michael Carbin · Jonathan Frankle -
2021 : Towards a Unified Information-Theoretic Framework for Generalization »
Mahdi Haghifam · Gintare Karolina Dziugaite · Shay Moran -
2021 : On the Generalization Improvement from Neural Network Pruning »
Tian Jin · Gintare Karolina Dziugaite · Michael Carbin -
2022 : Pre-Training on a Data Diet: Identifying Sufficient Examples for Early Training »
Mansheej Paul · Brett Larsen · Surya Ganguli · Jonathan Frankle · Gintare Karolina Dziugaite -
2022 : Knowledge Distillation for Efficient Sequences of Training Runs »
Xingyu Liu · Xingyu Liu · Alexander Leonardi · Alexander Leonardi · Lu Yu · Lu Yu · Christopher Gilmer-Hill · Christopher Gilmer-Hill · Matthew Leavitt · Matthew Leavitt · Jonathan Frankle · Jonathan Frankle -
2023 : Flat minima can fail to transfer to downstream tasks »
Deepansha Singh · Ekansh Sharma · Daniel Roy · Gintare Karolina Dziugaite -
2023 : Distributions for Compositionally Differentiating Parametric Discontinuities »
Jesse Michel · Kevin Mu · Xuanda Yang · Sai Praveen Bangaru · Elias Rojas Collins · Gilbert Bernstein · Jonathan Ragan-Kelley · Michael Carbin · Tzu-Mao Li -
2023 : Can LLMs Generate Random Numbers? Evaluating LLM Sampling in Controlled Domains »
Aspen Hopkins · Alex Renda · Michael Carbin -
2023 : Invited talk: Lessons Learned from Studying PAC-Bayes and Generalization »
Gintare Karolina Dziugaite -
2022 : Finding Structured Winning Tickets with Early Pruning »
Udbhav Bamba · Devin Kwok · Gintare Karolina Dziugaite · David Rolnick -
2022 Poster: What Can Linear Interpolation of Neural Network Loss Landscapes Tell Us? »
Tiffany Vlaar · Jonathan Frankle -
2022 Spotlight: What Can Linear Interpolation of Neural Network Loss Landscapes Tell Us? »
Tiffany Vlaar · Jonathan Frankle -
2021 : On the Generalization Improvement from Neural Network Pruning »
Tian Jin · Gintare Karolina Dziugaite · Michael Carbin -
2021 Poster: On the Predictability of Pruning Across Scales »
Jonathan Rosenfeld · Jonathan Frankle · Michael Carbin · Nir Shavit -
2021 Spotlight: On the Predictability of Pruning Across Scales »
Jonathan Rosenfeld · Jonathan Frankle · Michael Carbin · Nir Shavit -
2020 : Q&A: Jonathan Frankle »
Jonathan Frankle · Mayoore Jaiswal -
2020 : Contributed Talk: Jonathan Frankle »
Jonathan Frankle -
2020 Poster: Generalization via Derandomization »
Jeffrey Negrea · Gintare Karolina Dziugaite · Daniel Roy -
2020 Poster: Improved Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance »
Blair Bilodeau · Dylan Foster · Daniel Roy -
2019 : Panel Discussion (Nati Srebro, Dan Roy, Chelsea Finn, Mikhail Belkin, Aleksander MÄ…dry, Jason Lee) »
Nati Srebro · Daniel Roy · Chelsea Finn · Mikhail Belkin · Aleksander Madry · Jason Lee -
2019 : Keynote by Dan Roy: Progress on Nonvacuous Generalization Bounds »
Daniel Roy -
2019 Poster: Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks »
Charith Mendis · Alex Renda · Dr.Saman Amarasinghe · Michael Carbin -
2019 Oral: Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks »
Charith Mendis · Alex Renda · Dr.Saman Amarasinghe · Michael Carbin -
2018 Poster: Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors »
Gintare Karolina Dziugaite · Daniel Roy -
2018 Oral: Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors »
Gintare Karolina Dziugaite · Daniel Roy