Stochastic Gradient Methods under Heavy-Tailed Noises in Weakly Convex Optimization
Tianxi Zhu ⋅ Yi Xu ⋅ Cheems Wang ⋅ Xiangyang Ji
Abstract
Recently, many empirical work has shown that, in machine learning, the noise distribution of stochastic gradients often exhibits heavy tails when stochastic optimization methods are employed. Most existing theoretical analyses of heavy-tailed stochastic methods rely on various convexity and smoothness assumptions and our knowledge of how heavy-tailed stochastic methods behave in the setting of weakly convex optimization is still limited. In the weakly convex setting, this paper derives new upper bounds on the convergence of the stochastic gradient method (SGD) under heavy-tailed noises. In particular, for vanilla SGD, we establish an in-expectation convergence guarantee on the bounded constrained domain under the assumption of bounded $p$-th central moment ($p$-BCM) of the gradient noise, and a high-probability guarantee on the unbounded domain when the noise follows a heavy-tailed sub-Weibull distribution. By equipping SGD with the gradient clipping (Clip-SGD), we demonstrate that it achieves high-probability convergence in the unbounded domain under the $p$-BCM gradient noise. All of our high-probability convergence bounds depend on the failure probability only through polynomial-logarithmic factors. Finally, we present numerical experiments to validate our theoretical findings.
Successful Page Load