Poster
in
Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
Marcel Kühn · Bernd Rosenow
Stochastic gradient descent (SGD) is a fundamental optimization method in neural networks, yet the noise it introduces is often assumed to be uncorrelated over time. This paper challenges that assumption by examining epoch-based noise correlations in discrete-time SGD with momentum under a quadratic loss. Assuming that the noise is independent of small fluctuations in the weight vector, we calculate the exact autocorrelation of the noise and find that SGD noise is anti-correlated in time. We explore the impact of these anti-correlations on SGD dynamics, finding that for directions with curvature below a hyperparameter-dependent crossover value, the weight variance is significantly reduced. This reduction leads to decreased loss fluctuations, which we relate to SGD’s ability to find flat minima, thereby enhancing generalization performance.