Flatness-Aware Stochastic Gradient Langevin Dynamics
Stefano Bruno ⋅ Youngsik Hwang ⋅ JaeHyeon An ⋅ Sotirios Sabanis ⋅ Dongyoung Lim
Abstract
Flatness of the loss landscape has been widely studied as an important perspective for understanding the behavior and generalization of deep learning algorithms. Motivated by this view, we propose Flatness-Aware Stochastic Gradient Langevin Dynamics (fSGLD), a first-order optimization method that biases learning its dynamics toward flat basins while retaining the computational and memory efficiency of SGD and SGLD. We provide a non-asymptotic theoretical analysis showing that fSGLD converges to a flatness-biased Gibbs distribution under a theoretically prescribed coupling between the noise scale $\sigma$ and the inverse temperature $\beta$, together with explicit excess risk guarantees. We empirically evaluate fSGLD across standard optimizer benchmarks, Bayesian image classification, uncertainty quantification, and out-of-distribution detection, demonstrating consistently strong performance and reliable uncertainty estimates. Additional experiments confirm the effectiveness of the theoretically prescribed $\beta$–$\sigma$ coupling compared to decoupled choices.
Successful Page Load