Timezone: »

Towards understanding how momentum improves generalization in deep learning
Samy Jelassi · Yuanzhi Li

Sat Jul 24 02:20 PM -- 02:35 PM (PDT) @

Stochastic gradient descent (SGD) with momentum is widely used for training modern deep learning architectures. While it is well understood that using momentum can lead to faster convergence rate in various settings, it has also been empirically observed that adding momentum yields higher generalization. This paper formally studies how momentum help generalization in deep learning: