Timezone: »

A Bias-Variance Analysis of Weight Averaging for OOD Generalization
Alexandre Ramé · Matthieu Kirchmeyer · Thibaud J Rahier · Alain Rakotomamonjy · Patrick Gallinari · Matthieu Cord

Standard neural networks struggle to generalize under distribution shifts. For out-of-distribution generalization in computer vision, the best current approach averages the weights along a training run. Previous papers argue that weight averaging (WA) succeeds because it flattens the loss landscape. Our paper highlights the limitations of this analysis and proposes a new one based on WA's similarities with functional ensembling. We provide a new bias-variance-covariance-locality decomposition of WA's expected error: it explains WA's success especially when the marginal distribution changes at test time. Our analysis deepens the understanding of WA and more generally of deep networks under distribution shifts.

Author Information

Alexandre Ramé (LIP6)
Matthieu Kirchmeyer (Sorbonne Université & Criteo AI Lab)
Thibaud J Rahier (INRIA)
Alain Rakotomamonjy (Criteo)
Patrick Gallinari (Criteo Research)
Matthieu Cord (Sorbonne University)

More from the Same Authors