A longstanding problem in stochastic optimization is proving that \rsgd, the withoutreplacement version of \sgd, converges faster than the usual withreplacement \sgd. Building upon~\citep{gurbuzbalaban2015random}, we present the \emph{first} (to our knowledge) nonasymptotic results for this problem by proving that after a reasonable number of epochs \rsgd converges faster than \sgd. Specifically, we prove that for strongly convex, secondorder smooth functions, the iterates of \rsgd converge to the optimal solution as $\mathcal{O}(\nicefrac{1}{T^2} + \nicefrac{n^3}{T^3})$, where $n$ is the number of components in the objective, and $T$ is number of iterations. This result implies that after $\mathcal{O}(\sqrt{n})$ epochs, \rsgd is \emph{strictly better} than \sgd (which converges as $\mathcal{O}(\nicefrac{1}{T})$). The key step toward showing this better dependence on $T$ is the introduction of $n$ into the bound; and as our analysis shows, in general a dependence on $n$ is unavoidable without further changes. To understand how \rsgd works in practice, we further explore two empirically useful settings: data sparsity and overparameterization. For sparse data, \rsgd has the rate $\mathcal{O}\left(\frac{1}{T^2}\right)$, again strictly better than \sgd. Under a setting closely related to overparameterization, \rsgd is shown to converge faster than \sgd after any \emph{arbitrary} number of iterations. Finally, we extend the analysis of \rsgd to smooth nonconvex and convex functions.
Author Information
Jeff HaoChen (Tsinghua University)
Suvrit Sra (MIT)
Related Events (a corresponding poster, oral, or spotlight)

2019 Oral: Random Shuffling Beats SGD after Finite Epochs »
Wed Jun 12th 04:40  05:00 PM Room Room 103
More from the Same Authors

2019 Poster: Escaping Saddle Points with Adaptive Gradient Methods »
Matthew Staib · Sashank Jakkam Reddi · Satyen Kale · Sanjiv Kumar · Suvrit Sra 
2019 Oral: Escaping Saddle Points with Adaptive Gradient Methods »
Matthew Staib · Sashank Jakkam Reddi · Satyen Kale · Sanjiv Kumar · Suvrit Sra 
2019 Poster: Conditional Gradient Methods via Stochastic PathIntegrated Differential Estimator »
Alp Yurtsever · Suvrit Sra · Volkan Cevher 
2019 Oral: Conditional Gradient Methods via Stochastic PathIntegrated Differential Estimator »
Alp Yurtsever · Suvrit Sra · Volkan Cevher