ICML On the Effectiveness of Sharpness-Aware Minimization with Large Mini-batches

Poster
in
Workshop: HiLD: High-dimensional Learning Dynamics Workshop

On the Effectiveness of Sharpness-Aware Minimization with Large Mini-batches

Jinseok Chung · Seonghwan Park · Jaeho Lee · Namhoon Lee

[ Abstract ]

[ Poster]

Abstract:

Training with large mini-batches can increase hardware utilization and reduce training time. However, recent studies suggest that using large mini-batches often yields convergence to sharp minima, leading to poor generalization. In this work, we investigate the effectiveness of sharpness minimiza- tion for large-batch training. Specifically, we evaluate the sharpness-aware minimization (SAM) algorithm and compare it to the standard stochastic gradient descent (SGD) under fixed step size settings. We perform exhaustive grid search to set optimal hyperparameters in this process. As a result, we find that SAM consistently outperforms SGD, but undergoes critical performance degradation in the large-batch training regime.

Chat is not available.

Poster in Workshop: HiLD: High-dimensional Learning Dynamics Workshop

On the Effectiveness of Sharpness-Aware Minimization with Large Mini-batches

Jinseok Chung · Seonghwan Park · Jaeho Lee · Namhoon Lee

Poster
in
Workshop: HiLD: High-dimensional Learning Dynamics Workshop