Timezone: »
Stochastic gradient algorithms are often unstable when applied to functions that do not have Lipschitz-continuous and/or bounded gradients. Gradient clipping is a simple and effective technique to stabilize the training process for problems that are prone to the exploding gradient problem. Despite its widespread popularity, the convergence properties of the gradient clipping heuristic are poorly understood, especially for stochastic problems. This paper establishes both qualitative and quantitative convergence results of the clipped stochastic (sub)gradient method (SGD) for non-smooth convex functions with rapidly growing subgradients. Our analyses show that clipping enhances the stability of SGD and that the clipped SGD algorithm enjoys finite convergence rates in many cases. We also study the convergence of a clipped method with momentum, which includes clipped SGD as a special case, for weakly convex problems under standard assumptions. With a novel Lyapunov analysis, we show that the proposed method achieves the best-known rate for the considered class of problems, demonstrating the effectiveness of clipped methods also in this regime. Numerical results confirm our theoretical developments.
Author Information
Vien Mai (KTH Royal Institute of Technology)
Mikael Johansson (KTH Royal Institute of Technology)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness »
Tue. Jul 20th 04:00 -- 06:00 PM Room Virtual
More from the Same Authors
-
2023 Poster: Generalized Polyak Step Size for First Order Optimization with Momentum »
Xiaoyu Wang · Mikael Johansson · Tong Zhang -
2023 Poster: Delay-agnostic Asynchronous Coordinate Update Algorithm »
Xuyang Wu · Changxin Liu · Sindri Magnússon · Mikael Johansson -
2022 Poster: Delay-Adaptive Step-sizes for Asynchronous Learning »
Xuyang Wu · Sindri Magnússon · Hamid Reza Feyzmahdavian · Mikael Johansson -
2022 Spotlight: Delay-Adaptive Step-sizes for Asynchronous Learning »
Xuyang Wu · Sindri Magnússon · Hamid Reza Feyzmahdavian · Mikael Johansson -
2020 Poster: Anderson Acceleration of Proximal Gradient Methods »
Vien Mai · Mikael Johansson -
2020 Poster: Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization »
Vien Mai · Mikael Johansson -
2019 Poster: Curvature-Exploiting Acceleration of Elastic Net Computations »
Vien Mai · Mikael Johansson -
2019 Oral: Curvature-Exploiting Acceleration of Elastic Net Computations »
Vien Mai · Mikael Johansson