Skip to yearly menu bar Skip to main content


Poster

Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics

Ankit Vani · Frederick Tung · Gabriel Oliveira · Hossein Sharifi-Noghabi


Abstract:

Despite attaining high empirical generalization, the sharpness of models trained with sharpness-aware minimization (SAM) do not always correlate with generalization error. Instead of seeing SAM as minimizing sharpness to improve generalization, our paper considers a new perspective based on the dynamics of SAM. We propose that perturbations in SAM perform perturbed forgetting, where they discard undesirable model biases to exhibit learning signals that generalize better. We relate our notion of forgetting to the information bottleneck principle, and use it to explain previous observations like small batches for perturbations generalizing better. Under our perspective, standard SAM targets model biases exposed through steepest ascent directions, and we propose a new perturbation that targets biases encoded in the model's outputs. Our output-bias-forgetting perturbations outperform standard SAM and GSAM on ImageNet evaluation and robustness benchmarks, and allows for improved transfer learning to CIFAR-10 and CIFAR-100, while often converging to sharper loss regions. Our results support that the benefits of SAM can be explained by alternative mechanistic principles that do not require flatness of the loss surface.

Live content is unavailable. Log in and register to view live content