Timezone: »

 
Why is SAM Robust to Label Noise?
Christina Baek · Zico Kolter · Aditi Raghunathan
Event URL: https://openreview.net/forum?id=lnsWVP4BhP »

Sharpness-Aware Minimization (SAM) has recently been able to achieve state-of-the-art generalization performances in both natural image and language tasks. Previous work has largely tried to understand this generalization performance by characterizing SAM's solutions as lying in ``flat'' (low curvature) regions of the loss landscape. However, others works have also shown that the correlation between various notions of flatness and generalization is weak, raising some doubts about such justification. In this paper, we focus on understanding SAM in the presence of label noise, where the performance gains of SAM are especially pronounced. We first show that SAM's improved generalization properties can already be observed in linear logistic regression, where 1-SAM reduces to simply up-weighting the gradients from correctly labeled points during the early epochs of the training trajectory. Next, we empirically investigate how SAM's learning dynamics change for neural networks, showing similar behavior with regard to how it handles noisy versus clean samples. We conclude that SAM's gains in the label noise setting can largely be explained by how it regularizes the speed at which different examples are learned during the training.

Author Information

Christina Baek (Carnegie Mellon University)
Zico Kolter (Carnegie Mellon University / Bosch Center for AI)
Aditi Raghunathan (Carnegie Mellon University)

More from the Same Authors