Skip to yearly menu bar Skip to main content


Poster
in
Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning

Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances

Marcel Kühn · Bernd Rosenow


Abstract:

Stochastic gradient descent (SGD) is a fundamental optimization method in neural networks, yet the noise it introduces is often assumed to be uncorrelated over time. This paper challenges that assumption by examining epoch-based noise correlations in discrete-time SGD with momentum under a quadratic loss. Assuming that the noise is independent of small fluctuations in the weight vector, we calculate the exact autocorrelation of the noise and find that SGD noise is anti-correlated in time. We explore the impact of these anti-correlations on SGD dynamics, finding that for directions with curvature below a hyperparameter-dependent crossover value, the weight variance is significantly reduced. This reduction leads to decreased loss fluctuations, which we relate to SGD’s ability to find flat minima, thereby enhancing generalization performance.

Chat is not available.