ICML On the Still Unreasonable Effectiveness of Federated Averaging for Heterogeneous Distributed Learning

Poster
in
Workshop: Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities

On the Still Unreasonable Effectiveness of Federated Averaging for Heterogeneous Distributed Learning

Kumar Kshitij Patel · Margalit Glasgow · Lingxiao Wang · Nirmit Joshi · Nati Srebro

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Federated Averaging/local SGD is the most common optimization method for federated learning that has proven effective in many real-world applications, dominating simple baselines like mini-batch SGD. However, theoretically showing the effectiveness of local SGD remains challenging, posing a huge gap between theory and practice. In this paper, we provide new lower bounds for local SGD, ruling out proposed heterogeneity assumptions that try to capture this "unreasonable" effectiveness of local SGD. We show that accelerated mini-batch SGD is, in fact, min-max optimal under some heterogeneity notions. This highlights the need for new heterogeneity assumptions for federated optimization, and we discuss some alternative assumptions.

Chat is not available.

Poster in Workshop: Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities

On the Still Unreasonable Effectiveness of Federated Averaging for Heterogeneous Distributed Learning

Kumar Kshitij Patel · Margalit Glasgow · Lingxiao Wang · Nirmit Joshi · Nati Srebro

Poster
in
Workshop: Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities