Timezone: »
Poster
Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances
Bugra Can · Mert Gurbuzbalaban · Lingjiong Zhu
Momentum methods such as Polyak's heavy ball (HB) method, Nesterov's accelerated gradient (AG) as well as accelerated projected gradient (APG) method have been commonly used in machine learning practice, but their performance is quite sensitive to noise in the gradients. We study these methods under a first-order stochastic oracle model where noisy estimates of the gradients are available. For strongly convex problems, we show that the distribution of the iterates of AG converges with the accelerated $O(\sqrt{\kappa}\log(1/\varepsilon))$ linear rate to a ball of radius $\varepsilon$ centered at a unique invariant distribution in the 1-Wasserstein metric where $\kappa$ is the condition number as long as the noise variance is smaller than an explicit upper bound we can provide. Our analysis also certifies linear convergence rates as a function of the stepsize, momentum parameter and the noise variance; recovering the accelerated rates in the noiseless case and quantifying the level of noise that can be tolerated to achieve a given performance. To the best of our knowledge, these are the first linear convergence results for stochastic momentum methods under the stochastic oracle model. We also develop finer results for the special case of quadratic objectives, extend our results to the APG method and weakly convex functions showing accelerated rates when the noise magnitude is sufficiently small.
Author Information
Bugra Can (Rutgers University)
Mert Gurbuzbalaban (Rutgers University)
Lingjiong Zhu (Florida State University)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances »
Wed. Jun 12th 11:00 -- 11:20 PM Room Room 103
More from the Same Authors
-
2023 Poster: Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions »
Anant Raj · Lingjiong Zhu · Mert Gurbuzbalaban · Umut Simsekli -
2021 Poster: Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections »
Alexander D Camuto · Xiaoyu Wang · Lingjiong Zhu · Christopher Holmes · Mert Gurbuzbalaban · Umut Simsekli -
2021 Spotlight: Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections »
Alexander D Camuto · Xiaoyu Wang · Lingjiong Zhu · Christopher Holmes · Mert Gurbuzbalaban · Umut Simsekli -
2021 Poster: The Heavy-Tail Phenomenon in SGD »
Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2021 Spotlight: The Heavy-Tail Phenomenon in SGD »
Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2020 Poster: Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise »
Umut Simsekli · Lingjiong Zhu · Yee-Whye Teh · Mert Gurbuzbalaban -
2019 Poster: A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks »
Umut Simsekli · Levent Sagun · Mert Gurbuzbalaban -
2019 Oral: A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks »
Umut Simsekli · Levent Sagun · Mert Gurbuzbalaban