Timezone: »

Layerwise Linear Mode Connectivity
Linara Adilova · Asja Fischer · Martin Jaggi

In federated setup one performs an aggregation of separate local models multiple times during training in order to obtain a stronger model; most often it is a simple averaging of the parameters. Understanding why averaging works in a non-convex setup, such as federated deep learning, is an open challenge that hinders obtaining highly performant global models. On i.i.d. datasets federated deep learning with frequent averaging is successful. The common understanding, however, is thatduring the independent training models are drifting away from each other and thus averaging may not work anymore. It can be seen from the perspective of the loss surface of the problem: on a non-convex surface the average can become arbitrarily bad. Assuming local convexity contradicts to the empirical evidence showing that high loss barriers exist between models from the very beginning of the learning, even when training on the same data. Based on the observation that the learning process evolves differently in different layers, we investigate the barrier between modelsin a layerwise fashion. Our conjecture is that barriers preventing from successful federated training are caused by a particular layer or group of layers.

Author Information

Linara Adilova (Ruhr University Bochum)
Asja Fischer (Ruhr University Bochum)
Martin Jaggi (EPFL)

More from the Same Authors