Timezone: »

FedAvg Converges to Zero Training Loss Linearly for Overparameterized Multi-Layer Neural Networks
Bingqing Song · Prashant Khanduri · xinwei zhang · Jinfeng Yi · Mingyi Hong

Tue Jul 25 02:00 PM -- 04:30 PM (PDT) @ Exhibit Hall 1 #315
Federated Learning (FL) is a distributed learning paradigm that allows multiple clients to learn a joint model by utilizing privately held data at each client. Significant research efforts have been devoted to develop advanced algorithms that deal with the situation where the data at individual clients have heterogeneous distributions. In this work, we show that data heterogeneity can be dealt from a different perspective. That is, by utilizing a certain overparameterized multi-layer neural network at each client, even the vanilla FedAvg (a.k.a. the Local SGD) algorithm can accurately optimize the training problem: When each client has a neural network with one wide layer of size $N$ (where $N$ is the number of total training samples), followed by layers of smaller widths, FedAvg converges linearly to a solution that achieves (almost) zero training loss, without requiring any assumptions on the clients' data distributions. To our knowledge, this is the first work that demonstrates such resilience to data heterogeneity for FedAvg when trained on multi-layer neural networks. Our experiments also confirm that, neural networks of large size can achieve better and more stable performance for FL problems.

Author Information

Bingqing Song (University of Minnesota)
Prashant Khanduri (Wayne State University)
xinwei zhang (University of Minnesota)
Jinfeng Yi (JD AI Research)
Mingyi Hong (University of Minnesota)

More from the Same Authors