Timezone: »
Poster
Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent
Sharan Vaswani · Benjamin Dubois-Taine · Reza Babanezhad
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^2$ in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, strongly-convex functions with condition number $\kappa$, we prove that $T$ iterations of SGD with exponentially decreasing step-sizes and knowledge of the smoothness can achieve an $\tilde{O} \left(\exp \left( \nicefrac{-T}{\kappa} \right) + \nicefrac{\sigma^2}{T} \right)$ rate, without knowing $\sigma^2$. In order to be adaptive to the smoothness, we use a stochastic line-search (SLS) and show (via upper and lower-bounds) that SGD with SLS converges at the desired rate, but only to a neighbourhood of the solution. On the other hand, we prove that SGD with an offline estimate of the smoothness converges to the minimizer. However, its rate is slowed down proportional to the estimation error. Next, we prove that SGD with Nesterov acceleration and exponential step-sizes (referred to as ASGD) can achieve the near-optimal $\tilde{O} \left(\exp \left( \nicefrac{-T}{\sqrt{\kappa}} \right) + \nicefrac{\sigma^2}{T} \right)$ rate, without knowledge of $\sigma^2$. When used with offline estimates of the smoothness and strong-convexity, ASGD still converges to the solution, albeit at a slower rate. Finally, we empirically demonstrate the effectiveness of exponential step-sizes coupled with a novel variant of SLS.
Author Information
Sharan Vaswani (Simon Fraser University)
Benjamin Dubois-Taine (CNRS, ENS Paris, Inria)
Reza Babanezhad (Samsung)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Oral: Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent »
Wed. Jul 20th 03:15 -- 03:35 PM Room Room 327 - 329
More from the Same Authors
-
2021 : A functional mirror ascent view of policy gradient methods with function approximation »
Sharan Vaswani · Olivier Bachem · Simone Totaro · Matthieu Geist · Marlos C. Machado · Pablo Samuel Castro · Nicolas Le Roux -
2023 : Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees »
Sharan Vaswani · Amirreza Kazemi · Reza Babanezhad · Nicolas Le Roux -
2023 Poster: Fast Online Node Labeling for Very Large Graphs »
Baojian Zhou · Yifan Sun · Reza Babanezhad -
2023 Poster: Target-based Surrogates for Stochastic Optimization »
Jonathan Lavington · Sharan Vaswani · Reza Babanezhad · Mark Schmidt · Nicolas Le Roux -
2021 Poster: Infinite-Dimensional Optimization for Zero-Sum Games via Variational Transport »
Lewis Liu · Yufeng Zhang · Zhuoran Yang · Reza Babanezhad · Zhaoran Wang -
2021 Spotlight: Infinite-Dimensional Optimization for Zero-Sum Games via Variational Transport »
Lewis Liu · Yufeng Zhang · Zhuoran Yang · Reza Babanezhad · Zhaoran Wang