Timezone: »

On the Effectiveness of Sharpness-Aware Minimization with Large Mini-batches
Jinseok Chung · Seonghwan Park · Jaeho Lee · Namhoon Lee

Training with large mini-batches can increase hardware utilization and reduce training time. However, recent studies suggest that using large mini-batches often yields convergence to sharp minima, leading to poor generalization. In this work, we investigate the effectiveness of sharpness minimiza- tion for large-batch training. Specifically, we evaluate the sharpness-aware minimization (SAM) algorithm and compare it to the standard stochastic gradient descent (SGD) under fixed step size settings. We perform exhaustive grid search to set optimal hyperparameters in this process. As a result, we find that SAM consistently outperforms SGD, but undergoes critical performance degradation in the large-batch training regime.

Author Information

Jinseok Chung (POSTECH)
Seonghwan Park (POSTECH)
Jaeho Lee (POSTECH)
Namhoon Lee (POSTECH)

More from the Same Authors