Poster
in
Workshop: Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities
On the Still Unreasonable Effectiveness of Federated Averaging for Heterogeneous Distributed Learning
Kumar Kshitij Patel · Margalit Glasgow · Lingxiao Wang · Nirmit Joshi · Nati Srebro
Federated Averaging/local SGD is the most common optimization method for federated learning that has proven effective in many real-world applications, dominating simple baselines like mini-batch SGD. However, theoretically showing the effectiveness of local SGD remains challenging, posing a huge gap between theory and practice. In this paper, we provide new lower bounds for local SGD, ruling out proposed heterogeneity assumptions that try to capture this "unreasonable" effectiveness of local SGD. We show that accelerated mini-batch SGD is, in fact, min-max optimal under some heterogeneity notions. This highlights the need for new heterogeneity assumptions for federated optimization, and we discuss some alternative assumptions.