ICML A Bias-Variance Analysis of Weight Averaging for OOD Generalization

Poster
in
Workshop: Principles of Distribution Shift (PODS)

A Bias-Variance Analysis of Weight Averaging for OOD Generalization

Alexandre Ramé · Matthieu Kirchmeyer · Thibaud J Rahier · Alain Rakotomamonjy · Patrick Gallinari · Matthieu Cord

[ Abstract ]

[ Poster]

Abstract:

Standard neural networks struggle to generalize under distribution shifts. For out-of-distribution generalization in computer vision, the best current approach averages the weights along a training run. Previous papers argue that weight averaging (WA) succeeds because it flattens the loss landscape. Our paper highlights the limitations of this analysis and proposes a new one based on WA's similarities with functional ensembling. We provide a new bias-variance-covariance-locality decomposition of WA's expected error: it explains WA's success especially when the marginal distribution changes at test time. Our analysis deepens the understanding of WA and more generally of deep networks under distribution shifts.

Chat is not available.

Poster in Workshop: Principles of Distribution Shift (PODS)

A Bias-Variance Analysis of Weight Averaging for OOD Generalization

Alexandre Ramé · Matthieu Kirchmeyer · Thibaud J Rahier · Alain Rakotomamonjy · Patrick Gallinari · Matthieu Cord

Poster
in
Workshop: Principles of Distribution Shift (PODS)