Timezone: »
We study Nesterov's accelerated gradient method with constant step-size and momentum parameters in the stochastic approximation setting (unbiased gradients with bounded variance) and the finite-sum setting (where randomness is due to sampling mini-batches). To build better insight into the behavior of Nesterov's method in stochastic settings, we focus throughout on objectives that are smooth, strongly-convex, and twice continuously differentiable. In the stochastic approximation setting, Nesterov's method converges to a neighborhood of the optimal point at the same accelerated rate as in the deterministic setting. Perhaps surprisingly, in the finite-sum setting, we prove that Nesterov's method may diverge with the usual choice of step-size and momentum, unless additional conditions on the problem related to conditioning and data coherence are satisfied. Our results shed light as to why Nesterov's method may fail to converge or achieve acceleration in the finite-sum setting.
Author Information
Mahmoud Assran (McGill University; Mila; Facebook AI Research)
Michael Rabbat (Facebook)
More from the Same Authors
-
2022 : Positive Unlabeled Contrastive Representation Learning »
Anish Acharya · Sujay Sanghavi · Li Jing · Bhargav Bhushanam · Michael Rabbat · Dhruv Choudhary · Inderjit Dhillon -
2022 Poster: Federated Learning with Partial Model Personalization »
Krishna Pillutla · Kshitiz Malik · Abdel-rahman Mohamed · Michael Rabbat · Maziar Sanjabi · Lin Xiao -
2022 Spotlight: Federated Learning with Partial Model Personalization »
Krishna Pillutla · Kshitiz Malik · Abdel-rahman Mohamed · Michael Rabbat · Maziar Sanjabi · Lin Xiao -
2019 Poster: TarMAC: Targeted Multi-Agent Communication »
Abhishek Das · Theophile Gervet · Joshua Romoff · Dhruv Batra · Devi Parikh · Michael Rabbat · Joelle Pineau -
2019 Oral: TarMAC: Targeted Multi-Agent Communication »
Abhishek Das · Theophile Gervet · Joshua Romoff · Dhruv Batra · Devi Parikh · Michael Rabbat · Joelle Pineau -
2019 Poster: Stochastic Gradient Push for Distributed Deep Learning »
Mahmoud Assran · Nicolas Loizou · Nicolas Ballas · Michael Rabbat -
2019 Oral: Stochastic Gradient Push for Distributed Deep Learning »
Mahmoud Assran · Nicolas Loizou · Nicolas Ballas · Michael Rabbat