Understanding SAM through Minimax Perspective
Ying Chen ⋅ Aoxi Li ⋅ Javad Lavaei
Abstract
Sharpness-Aware Minimization (SAM) empirically boosts generalization by seeking parameters that minimize the worst-case loss in a small neighborhood, yet existing theory explains its behavior under either strong convexity or small perturbation radius. We revisit SAM through the bilevel minimax problem $\min_{\theta}\max_{\|\Delta\|\le\rho}l(\theta+\Delta)$ and derive a $(\theta,\Delta)$ gradient flow ODE whose equilibria coincide with the problem’s optimality conditions. A Lyapunov argument—free of convexity assumptions—quantifies how the optimality gap depends on the radius~$\rho$ and local curvature. Discretizing the flow yields a \emph{Multi-step SAM} algorithm that recovers classical SAM as $\rho\!\to\!0$. {Moreover, our analysis and the resulting algorithm remain valid even for large $\rho$, providing principled guidance for aggressive neighborhood exploration.} Experiments on synthetic objectives and CIFAR-10 validate the predicted gains from multiple inner updates, bridging the gap between SAM’s minimax intuition and its practical implementation.
Successful Page Load