Timezone: »
Existing approaches to federated learning suffer from a communication bottleneck as well as convergence issues due to sparse client participation. In this paper we introduce a novel algorithm, called FetchSGD, to overcome these challenges. FetchSGD compresses model updates using a Count Sketch, and then takes advantage of the mergeability of sketches to combine model updates from many workers. A key insight in the design of FetchSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch. This allows the algorithm to move momentum and error accumulation from clients to the central aggregator, overcoming the challenges of sparse client participation while still achieving high compression rates and good convergence. We prove that FetchSGD has favorable convergence guarantees, and we demonstrate its empirical effectiveness by training two residual networks and a transformer model.
Author Information
Daniel Rothchild (UC Berkeley)
Ashwinee Panda (UC Berkeley)
Enayat Ullah (Johns Hopkins University)
Nikita Ivkin (Amazon)
Ion Stoica (UC Berkeley)
Vladimir Braverman (Johns Hopkins University)
Joseph Gonzalez (UC Berkeley)
Raman Arora (Johns Hopkins University)
More from the Same Authors
-
2020 Poster: Coresets for Clustering in Graphs of Bounded Treewidth »
Daniel Baker · Vladimir Braverman · Lingxiao Huang · Shaofeng H.-C. Jiang · Robert Krauthgamer · Xuan Wu -
2020 Poster: Frustratingly Simple Few-Shot Object Detection »
Xin Wang · Thomas Huang · Joseph Gonzalez · Trevor Darrell · Fisher Yu -
2020 Poster: Schatten Norms in Matrix Streams: Hello Sparsity, Goodbye Dimension »
Vladimir Braverman · Robert Krauthgamer · Aditya Krishnan · Roi Sinoff -
2020 Poster: Variable Skipping for Autoregressive Range Density Estimation »
Eric Liang · Zongheng Yang · Ion Stoica · Pieter Abbeel · Yan Duan · Peter Chen -
2020 Poster: Obtaining Adjustable Regularization for Free via Iterate Averaging »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2020 Poster: On the Noisy Gradient Descent that Generalizes as SGD »
Jingfeng Wu · Wenqing Hu · Haoyi Xiong · Jun Huan · Vladimir Braverman · Zhanxing Zhu -
2019 Workshop: AI in Finance: Applications and Infrastructure for Multi-Agent Learning »
Prashant Reddy · Tucker Balch · Michael Wellman · Senthil Kumar · Ion Stoica · Edith Elkind -
2019 Poster: Coresets for Ordered Weighted Clustering »
Vladimir Braverman · Shaofeng Jiang · Robert Krauthgamer · Xuan Wu -
2019 Oral: Coresets for Ordered Weighted Clustering »
Vladimir Braverman · Shaofeng Jiang · Robert Krauthgamer · Xuan Wu -
2019 Poster: On Dropout and Nuclear Norm Regularization »
Poorya Mianjy · Raman Arora -
2019 Oral: On Dropout and Nuclear Norm Regularization »
Poorya Mianjy · Raman Arora -
2018 Poster: RLlib: Abstractions for Distributed Reinforcement Learning »
Eric Liang · Richard Liaw · Robert Nishihara · Philipp Moritz · Roy Fox · Ken Goldberg · Joseph Gonzalez · Michael Jordan · Ion Stoica -
2018 Poster: On the Implicit Bias of Dropout »
Poorya Mianjy · Raman Arora · Rene Vidal -
2018 Oral: On the Implicit Bias of Dropout »
Poorya Mianjy · Raman Arora · Rene Vidal -
2018 Oral: RLlib: Abstractions for Distributed Reinforcement Learning »
Eric Liang · Richard Liaw · Robert Nishihara · Philipp Moritz · Roy Fox · Ken Goldberg · Joseph Gonzalez · Michael Jordan · Ion Stoica -
2018 Poster: Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order »
Vladimir Braverman · Stephen Chestnut · Robert Krauthgamer · Yi Li · David Woodruff · Lin Yang -
2018 Oral: Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order »
Vladimir Braverman · Stephen Chestnut · Robert Krauthgamer · Yi Li · David Woodruff · Lin Yang -
2018 Poster: Streaming Principal Component Analysis in Noisy Setting »
Teodor Vanislavov Marinov · Poorya Mianjy · Raman Arora -
2018 Poster: Stochastic PCA with $\ell_2$ and $\ell_1$ Regularization »
Poorya Mianjy · Raman Arora -
2018 Oral: Streaming Principal Component Analysis in Noisy Setting »
Teodor Vanislavov Marinov · Poorya Mianjy · Raman Arora -
2018 Oral: Stochastic PCA with $\ell_2$ and $\ell_1$ Regularization »
Poorya Mianjy · Raman Arora -
2017 Poster: Clustering High Dimensional Dynamic Data Streams »
Lin Yang · Harry Lang · Christian Sohler · Vladimir Braverman · Gereon Frahling -
2017 Talk: Clustering High Dimensional Dynamic Data Streams »
Lin Yang · Harry Lang · Christian Sohler · Vladimir Braverman · Gereon Frahling