Timezone: »
Spotlight
Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning
Tomoya Murata · Taiji Suzuki
Recently, local SGD has got much attention and been extensively studied in the distributed learning community to overcome the communication bottleneck problem. However, the superiority of local SGD to minibatch SGD only holds in quite limited situations. In this paper, we study a new local algorithm called Bias-Variance Reduced Local SGD (BVR-L-SGD) for nonconvex distributed optimization. Algorithmically, our proposed bias and variance reduced local gradient estimator fully utilizes small second-order heterogeneity of local objectives and suggests randomly picking up one of the local models instead of taking the average of them when workers are synchronized. Theoretically, under small heterogeneity of local objectives, we show that BVR-L-SGD achieves better communication complexity than both the previous non-local and local methods under mild conditions, and particularly BVR-L-SGD is the first method that breaks the barrier of communication complexity $\Theta(1/\varepsilon)$ for general nonconvex smooth objectives when the heterogeneity is small and the local computation budget is large. Numerical results are given to verify the theoretical findings and give empirical evidence of the superiority of our method.
Author Information
Tomoya Murata (NTT DATA Mathematical Systems Inc.)
Taiji Suzuki (The University of Tokyo / RIKEN)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning »
Tue. Jul 20th 04:00 -- 06:00 PM Room
More from the Same Authors
-
2023 : Benign Overfitting of Two-Layer Neural Networks under Inputs with Intrinsic Dimension »
Shunta Akiyama · Kazusato Oko · Taiji Suzuki -
2023 : Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective »
Wei Huang · Yuan Cao · Haonan Wang · Xin Cao · Taiji Suzuki -
2023 : Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu -
2023 : Learning Green's Function Efficiently Using Low-Rank Approximations »
Kishan Wimalawarne · Taiji Suzuki · Sophie Langer -
2023 Poster: DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning »
Tomoya Murata · Taiji Suzuki -
2023 Poster: Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems »
Atsushi Nitanda · Kazusato Oko · Denny Wu · Nobuhito Takenouchi · Taiji Suzuki -
2023 Oral: Diffusion Models are Minimax Optimal Distribution Estimators »
Kazusato Oko · Shunta Akiyama · Taiji Suzuki -
2023 Poster: Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input »
Shokichi Takakura · Taiji Suzuki -
2023 Poster: Diffusion Models are Minimax Optimal Distribution Estimators »
Kazusato Oko · Shunta Akiyama · Taiji Suzuki -
2023 Poster: Tight and fast generalization error bound of graph embedding in metric space »
Atsushi Suzuki · Atsushi Nitanda · Taiji Suzuki · Jing Wang · Feng Tian · Kenji Yamanishi -
2021 Poster: On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting »
Shunta Akiyama · Taiji Suzuki -
2021 Spotlight: On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting »
Shunta Akiyama · Taiji Suzuki -
2021 Poster: Quantitative Understanding of VAE as a Non-linearly Scaled Isometric Embedding »
Akira Nakagawa · Keizo Kato · Taiji Suzuki -
2021 Spotlight: Quantitative Understanding of VAE as a Non-linearly Scaled Isometric Embedding »
Akira Nakagawa · Keizo Kato · Taiji Suzuki -
2019 Poster: Approximation and non-parametric estimation of ResNet-type convolutional neural networks »
Kenta Oono · Taiji Suzuki -
2019 Oral: Approximation and non-parametric estimation of ResNet-type convolutional neural networks »
Kenta Oono · Taiji Suzuki -
2018 Poster: Functional Gradient Boosting based on Residual Network Perception »
Atsushi Nitanda · Taiji Suzuki -
2018 Oral: Functional Gradient Boosting based on Residual Network Perception »
Atsushi Nitanda · Taiji Suzuki