Skip to yearly menu bar Skip to main content


Poster
in
Workshop: HiLD: High-dimensional Learning Dynamics Workshop

Layerwise Linear Mode Connectivity

Linara Adilova · Asja Fischer · Martin Jaggi


Abstract:

In federated setup one performs an aggregation of separate local models multiple times during training in order to obtain a stronger model; most often it is a simple averaging of the parameters. Understanding why averaging works in a non-convex setup, such as federated deep learning, is an open challenge that hinders obtaining highly performant global models. On i.i.d. datasets federated deep learning with frequent averaging is successful. The common understanding, however, is thatduring the independent training models are drifting away from each other and thus averaging may not work anymore. It can be seen from the perspective of the loss surface of the problem: on a non-convex surface the average can become arbitrarily bad. Assuming local convexity contradicts to the empirical evidence showing that high loss barriers exist between models from the very beginning of the learning, even when training on the same data. Based on the observation that the learning process evolves differently in different layers, we investigate the barrier between modelsin a layerwise fashion. Our conjecture is that barriers preventing from successful federated training are caused by a particular layer or group of layers.

Chat is not available.