Timezone: »
Distributed data-parallel algorithms aim to accelerate the training of deep neural networks by parallelizing the computation of large mini-batch gradient updates across multiple nodes. Approaches that synchronize nodes using exact distributed averaging (e.g., via All-Reduce) are sensitive to stragglers and communication delays. The PushSum gossip algorithm is robust to these issues, but only performs approximate distributed averaging. This paper studies Stochastic Gradient Push (SGP), which combines PushSum with stochastic gradient updates. We prove that SGP converges to a stationary point of smooth, non-convex objectives at the same sub-linear rate as SGD, that all nodes achieve consensus, and that SGP achieves a linear speedup with respect to the number of compute nodes. Furthermore, we empirically validate the performance of SGP on image classification and machine translation workloads. Our code, attached to the submission, will be made publicly available.
Author Information
Mahmoud Assran (McGill University/Facebook AI Research)
Nicolas Loizou (The University of Edinburgh)
https://www.maths.ed.ac.uk/~s1461357/
Nicolas Ballas (Facebook FAIR)
Michael Rabbat (Facebook)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Stochastic Gradient Push for Distributed Deep Learning »
Thu. Jun 13th 01:30 -- 04:00 AM Room Pacific Ballroom #183
More from the Same Authors
-
2022 : Positive Unlabeled Contrastive Representation Learning »
Anish Acharya · Sujay Sanghavi · Li Jing · Bhargav Bhushanam · Michael Rabbat · Dhruv Choudhary · Inderjit Dhillon -
2022 Poster: Federated Learning with Partial Model Personalization »
Krishna Pillutla · Kshitiz Malik · Abdel-rahman Mohamed · Michael Rabbat · Maziar Sanjabi · Lin Xiao -
2022 Spotlight: Federated Learning with Partial Model Personalization »
Krishna Pillutla · Kshitiz Malik · Abdel-rahman Mohamed · Michael Rabbat · Maziar Sanjabi · Lin Xiao -
2020 Poster: On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings »
Mahmoud Assran · Michael Rabbat -
2019 Poster: TarMAC: Targeted Multi-Agent Communication »
Abhishek Das · Theophile Gervet · Joshua Romoff · Dhruv Batra · Devi Parikh · Michael Rabbat · Joelle Pineau -
2019 Oral: TarMAC: Targeted Multi-Agent Communication »
Abhishek Das · Theophile Gervet · Joshua Romoff · Dhruv Batra · Devi Parikh · Michael Rabbat · Joelle Pineau -
2019 Poster: SGD: General Analysis and Improved Rates »
Robert Gower · Nicolas Loizou · Xun Qian · Alibek Sailanbayev · Egor Shulgin · Peter Richtarik -
2019 Oral: SGD: General Analysis and Improved Rates »
Robert Gower · Nicolas Loizou · Xun Qian · Alibek Sailanbayev · Egor Shulgin · Peter Richtarik