Timezone: »
Training with large mini-batches can increase hardware utilization and reduce training time. However, recent studies suggest that using large mini-batches often yields convergence to sharp minima, leading to poor generalization. In this work, we investigate the effectiveness of sharpness minimiza- tion for large-batch training. Specifically, we evaluate the sharpness-aware minimization (SAM) algorithm and compare it to the standard stochastic gradient descent (SGD) under fixed step size settings. We perform exhaustive grid search to set optimal hyperparameters in this process. As a result, we find that SAM consistently outperforms SGD, but undergoes critical performance degradation in the large-batch training regime.
Author Information
Jinseok Chung (POSTECH)
Seonghwan Park (POSTECH)
Jaeho Lee (POSTECH)
Namhoon Lee (POSTECH)
More from the Same Authors
-
2023 : Effects of Overparameterization on Sharpness-Aware Minimization: A Preliminary Investigation »
Sungbin Shin · Dongyeop Lee · Namhoon Lee -
2023 : Bias-to-Text: Debiasing Unknown Visual Biases by Language Interpretation »
Younghyun Kim · Sangwoo Mo · Minkyu Kim · Kyungmin Lee · Jaeho Lee · Jinwoo Shin -
2023 : Breaking the Spurious Causality of Conditional Generation via Fairness Intervention with Corrective Sampling »
Jun Hyun Nam · Sangwoo Mo · Jaeho Lee · Jinwoo Shin -
2023 : FedFwd: Federated Learning without Backpropagation »
Seonghwan Park · Dahun Shin · Jinseok Chung · Namhoon Lee -
2023 : Semi-supervised Concept Bottleneck Models »
Jeeon Bae · Sungbin Shin · Namhoon Lee -
2023 Poster: Modality-Agnostic Variational Compression of Implicit Neural Representations »
Jonathan Richard Schwarz · Jihoon Tack · Yee-Whye Teh · Jaeho Lee · Jinwoo Shin -
2023 Poster: A Closer Look at the Intervention Procedure of Concept Bottleneck Models »
Sungbin Shin · Yohan Jo · Sungsoo Ahn · Namhoon Lee