Timezone: »
Towards understanding how momentum improves generalization in deep learning
Samy Jelassi · Yuanzhi Li
Stochastic gradient descent (SGD) with momentum is widely used for training modern deep learning architectures. While it is well understood that using momentum can lead to faster convergence rate in various settings, it has also been empirically observed that adding momentum yields higher generalization. This paper formally studies how momentum help generalization in deep learning:
Author Information
Samy Jelassi (Princeton University)
Yuanzhi Li (CMU)
More from the Same Authors
-
2021 : When Is Generalizable Reinforcement Learning Tractable? »
Dhruv Malik · Yuanzhi Li · Pradeep Ravikumar -
2021 : Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 : Towards understanding how momentum improves generalization in deep learning »
Samy Jelassi · Yuanzhi Li -
2023 Poster: How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding »
Yuchen Li · Yuanzhi Li · Andrej Risteski -
2023 Poster: The Benefits of Mixup for Feature Learning »
Difan Zou · Yuan Cao · Yuanzhi Li · Quanquan Gu -
2023 Poster: Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality »
Dhruv Malik · Conor Igoe · Yuanzhi Li · Aarti Singh -
2022 Poster: Towards understanding how momentum improves generalization in deep learning »
Samy Jelassi · Yuanzhi Li -
2022 Spotlight: Towards understanding how momentum improves generalization in deep learning »
Samy Jelassi · Yuanzhi Li -
2021 Poster: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 Spotlight: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 Poster: Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning »
Zixin Wen · Yuanzhi Li -
2021 Spotlight: Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning »
Zixin Wen · Yuanzhi Li -
2020 Poster: Extra-gradient with player sampling for faster convergence in n-player games »
Samy Jelassi · Carles Domingo-Enrich · Damien Scieur · Arthur Mensch · Joan Bruna -
2019 Poster: Neuron birth-death dynamics accelerates gradient descent and converges asymptotically »
Grant Rotskoff · Samy Jelassi · Joan Bruna · Eric Vanden-Eijnden -
2019 Oral: Neuron birth-death dynamics accelerates gradient descent and converges asymptotically »
Grant Rotskoff · Samy Jelassi · Joan Bruna · Eric Vanden-Eijnden