Timezone: »

Towards understanding how momentum improves generalization in deep learning
Samy Jelassi · Yuanzhi Li

Stochastic gradient descent (SGD) with momentum is widely used for training modern deep learning architectures. While it is well understood that using momentum can lead to faster convergence rate in various settings, it has also been empirically observed that adding momentum yields higher generalization. This paper formally studies how momentum help generalization in deep learning:

Author Information

Samy Jelassi (Princeton University)
Yuanzhi Li (CMU)

More from the Same Authors