Timezone: »

On the Still Unreasonable Effectiveness of Federated Averaging for Heterogeneous Distributed Learning
Kumar Kshitij Patel · Margalit Glasgow · Lingxiao Wang · Nirmit Joshi · Nati Srebro
Event URL: https://openreview.net/forum?id=vhS68bKv7x »

Federated Averaging/local SGD is the most common optimization method for federated learning that has proven effective in many real-world applications, dominating simple baselines like mini-batch SGD. However, theoretically showing the effectiveness of local SGD remains challenging, posing a huge gap between theory and practice. In this paper, we provide new lower bounds for local SGD, ruling out proposed heterogeneity assumptions that try to capture this "unreasonable" effectiveness of local SGD. We show that accelerated mini-batch SGD is, in fact, min-max optimal under some heterogeneity notions. This highlights the need for new heterogeneity assumptions for federated optimization, and we discuss some alternative assumptions.

Author Information

Kumar Kshitij Patel (Toyota Technological Institute at Chicago)
Margalit Glasgow (Stanford University)
Lingxiao Wang (TTI-Chicago)
Nirmit Joshi (Toyota Technological Institute at Chicago)
Nati Srebro (Toyota Technological Institute at Chicago)

More from the Same Authors