Timezone: »
The Sharpness Aware Minimization (SAM) optimization algorithm has been shown to control large eigenvalues of the loss Hessian and provide generalization benefits in a variety of settings. The original motivation for SAM was a modified loss function which penalized sharp minima; subsequent analyses have also focused on the behavior near minima. However, our work reveals that SAM provides a strong regularization of the eigenvalues throughout the learning trajectory. We show that in a simplified setting, SAM dynamically induces a stabilization related to the edge of stability (EOS) phenomenon observed in large learning rate gradient descent. Our theory predicts the largest eigenvalue as a function of the learning rate and SAM radius parameters. Finally, we show that practical models can also exhibit this EOS stabilization, and that understanding SAM must account for these dynamics far away from any minima.
Author Information
Atish Agarwala (Google DeepMind)
Yann Nicolas Dauphin (Google DeepMind)
More from the Same Authors
-
2023 Poster: Tied-Augment: Controlling Representation Similarity Improves Data Augmentation »
Emirhan Kurtulus · Zichao Li · Yann Nicolas Dauphin · Ekin Dogus Cubuk -
2023 Poster: Second-order regression models exhibit progressive sharpening to the edge of stability »
Atish Agarwala · Fabian Pedregosa · Jeffrey Pennington -
2022 Poster: Deep equilibrium networks are sensitive to initialization statistics »
Atish Agarwala · Samuel Schoenholz -
2022 Spotlight: Deep equilibrium networks are sensitive to initialization statistics »
Atish Agarwala · Samuel Schoenholz