Timezone: »
Sharpness-Aware Minimization (SAM) is a recent training method that relies on worst-case weight perturbations which significantly improves generalization in various settings. We argue that the existing justifications for the success of SAM which are based on a PAC-Bayes generalization bound and the idea of convergence to flat minima are incomplete. Moreover, there are no explanations for the success of using m-sharpness in SAM which has been shown as essential for generalization. To better understand this aspect of SAM, we theoretically analyze its implicit bias for diagonal linear networks. We prove that SAM always chooses a solution that enjoys better generalization properties than standard gradient descent for a certain class of problems, and this effect is amplified by using m-sharpness. We further study the properties of the implicit bias on non-linear networks empirically, where we show that fine-tuning a standard model with SAM can lead to significant generalization improvements. Finally, we provide convergence results of SAM for non-convex objectives when used with stochastic gradients. We illustrate these results empirically for deep networks and discuss their relation to the generalization behavior of SAM. The code of our experiments is available at https://github.com/tml-epfl/understanding-sam.
Author Information
Maksym Andriushchenko (EPFL)
Nicolas Flammarion (EPFL)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Towards Understanding Sharpness-Aware Minimization »
Wed. Jul 20th through Thu the 21st Room Hall E #516
More from the Same Authors
-
2020 Poster: On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent »
Scott Pesme · Aymeric Dieuleveut · Nicolas Flammarion