A longstanding problem in stochastic optimization is proving that RandomShuffle, the withoutreplacement version of SGD, converges faster than the usual withreplacement SGD. Building upon Gurbuzbalaban et el, we present the first (to our knowledge) nonasymptotic results for this problem by proving that after a reasonable number of epochs RandomShuffle converges faster than SGD. Specifically, we prove that for strongly convex, secondorder smooth functions, the iterates of RandomShuffle converge to the optimal solution as O(1/T^2+n^3/T^3), where n is the number of components in the objective, and T is number of iterations. This result implies that after O(\sqrt{n}) epochs, RandomShuffle is strictly better than SGD (which converges as O(1/T)). The key step toward showing this better dependence on T is the introduction of n into the bound; and as our analysis shows, in general a dependence on n is unavoidable without further changes. To understand how RandomShuffle works in practice, we further explore two empirically useful settings: data sparsity and overparameterization. For sparse data, RandomShuffle has the rate O(1/T^2), again strictly better than SGD. Under a setting closely related to overparameterization, RandomShuffle is shown to converge faster than SGD after any arbitrary number of iterations. Finally, we extend the analysis of RandomShuffle to smooth nonconvex and convex functions.
Author Information
Jeff HaoChen (Tsinghua University)
Suvrit Sra (MIT)
Related Events (a corresponding poster, oral, or spotlight)

2019 Poster: Random Shuffling Beats SGD after Finite Epochs »
Wed Jun 12th 06:30  09:00 PM Room Pacific Ballroom
More from the Same Authors

2019 Poster: Escaping Saddle Points with Adaptive Gradient Methods »
Matthew Staib · Sashank Jakkam Reddi · Satyen Kale · Sanjiv Kumar · Suvrit Sra 
2019 Oral: Escaping Saddle Points with Adaptive Gradient Methods »
Matthew Staib · Sashank Jakkam Reddi · Satyen Kale · Sanjiv Kumar · Suvrit Sra 
2019 Poster: Conditional Gradient Methods via Stochastic PathIntegrated Differential Estimator »
Alp Yurtsever · Suvrit Sra · Volkan Cevher 
2019 Oral: Conditional Gradient Methods via Stochastic PathIntegrated Differential Estimator »
Alp Yurtsever · Suvrit Sra · Volkan Cevher