Timezone: »
A long-standing problem in stochastic optimization is proving that RandomShuffle, the without-replacement version of SGD, converges faster than the usual with-replacement SGD. Building upon Gurbuzbalaban et el, we present the first (to our knowledge) non-asymptotic results for this problem by proving that after a reasonable number of epochs RandomShuffle converges faster than SGD. Specifically, we prove that for strongly convex, second-order smooth functions, the iterates of RandomShuffle converge to the optimal solution as O(1/T^2+n^3/T^3), where n is the number of components in the objective, and T is number of iterations. This result implies that after O(\sqrt{n}) epochs, RandomShuffle is strictly better than SGD (which converges as O(1/T)). The key step toward showing this better dependence on T is the introduction of n into the bound; and as our analysis shows, in general a dependence on n is unavoidable without further changes. To understand how RandomShuffle works in practice, we further explore two empirically useful settings: data sparsity and over-parameterization. For sparse data, RandomShuffle has the rate O(1/T^2), again strictly better than SGD. Under a setting closely related to over-parameterization, RandomShuffle is shown to converge faster than SGD after any arbitrary number of iterations. Finally, we extend the analysis of RandomShuffle to smooth non-convex and convex functions.
Author Information
Jeff HaoChen (Tsinghua University)
Suvrit Sra (MIT)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Random Shuffling Beats SGD after Finite Epochs »
Thu. Jun 13th 01:30 -- 04:00 AM Room Pacific Ballroom #206
More from the Same Authors
-
2023 Poster: Global optimality for Euclidean CCCP under Riemannian convexity »
Melanie Weber · Suvrit Sra -
2023 Poster: On the Training Instability of Shuffling SGD with Batch Normalization »
David X. Wu · Chulhee Yun · Suvrit Sra -
2022 : Sign and Basis Invariant Networks for Spectral Graph Representation Learning »
Derek Lim · Joshua Robinson · Lingxiao Zhao · Tess Smidt · Suvrit Sra · Haggai Maron · Stefanie Jegelka -
2022 Poster: Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity »
Jingzhao Zhang · Hongzhou Lin · Subhro Das · Suvrit Sra · Ali Jadbabaie -
2022 Spotlight: Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity »
Jingzhao Zhang · Hongzhou Lin · Subhro Das · Suvrit Sra · Ali Jadbabaie -
2022 Poster: Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective »
Jingzhao Zhang · Haochuan Li · Suvrit Sra · Ali Jadbabaie -
2022 Poster: Understanding the unstable convergence of gradient descent »
Kwangjun Ahn · Jingzhao Zhang · Suvrit Sra -
2022 Spotlight: Understanding the unstable convergence of gradient descent »
Kwangjun Ahn · Jingzhao Zhang · Suvrit Sra -
2022 Spotlight: Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective »
Jingzhao Zhang · Haochuan Li · Suvrit Sra · Ali Jadbabaie -
2021 Poster: Provably Efficient Algorithms for Multi-Objective Competitive RL »
Tiancheng Yu · Yi Tian · Jingzhao Zhang · Suvrit Sra -
2021 Poster: Online Learning in Unknown Markov Games »
Yi Tian · Yuanhao Wang · Tiancheng Yu · Suvrit Sra -
2021 Spotlight: Online Learning in Unknown Markov Games »
Yi Tian · Yuanhao Wang · Tiancheng Yu · Suvrit Sra -
2021 Oral: Provably Efficient Algorithms for Multi-Objective Competitive RL »
Tiancheng Yu · Yi Tian · Jingzhao Zhang · Suvrit Sra -
2021 Poster: Three Operator Splitting with a Nonconvex Loss Function »
Alp Yurtsever · Varun Mangalick · Suvrit Sra -
2021 Spotlight: Three Operator Splitting with a Nonconvex Loss Function »
Alp Yurtsever · Varun Mangalick · Suvrit Sra -
2020 Poster: Strength from Weakness: Fast Learning Using Weak Supervision »
Joshua Robinson · Stefanie Jegelka · Suvrit Sra -
2020 Poster: Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions »
Jingzhao Zhang · Hongzhou Lin · Stefanie Jegelka · Suvrit Sra · Ali Jadbabaie -
2020 Poster: Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition »
Chi Jin · Tiancheng Jin · Haipeng Luo · Suvrit Sra · Tiancheng Yu -
2019 Poster: Escaping Saddle Points with Adaptive Gradient Methods »
Matthew Staib · Sashank Jakkam Reddi · Satyen Kale · Sanjiv Kumar · Suvrit Sra -
2019 Oral: Escaping Saddle Points with Adaptive Gradient Methods »
Matthew Staib · Sashank Jakkam Reddi · Satyen Kale · Sanjiv Kumar · Suvrit Sra -
2019 Poster: Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator »
Alp Yurtsever · Suvrit Sra · Volkan Cevher -
2019 Oral: Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator »
Alp Yurtsever · Suvrit Sra · Volkan Cevher