Timezone: »

Oral
Random Shuffling Beats SGD after Finite Epochs
Jeff HaoChen · Suvrit Sra

Wed Jun 12 04:40 PM -- 05:00 PM (PDT) @ Room 103

A long-standing problem in stochastic optimization is proving that RandomShuffle, the without-replacement version of SGD, converges faster than the usual with-replacement SGD. Building upon Gurbuzbalaban et el, we present the first (to our knowledge) non-asymptotic results for this problem by proving that after a reasonable number of epochs RandomShuffle converges faster than SGD. Specifically, we prove that for strongly convex, second-order smooth functions, the iterates of RandomShuffle converge to the optimal solution as O(1/T^2+n^3/T^3), where n is the number of components in the objective, and T is number of iterations. This result implies that after O(\sqrt{n}) epochs, RandomShuffle is strictly better than SGD (which converges as O(1/T)). The key step toward showing this better dependence on T is the introduction of n into the bound; and as our analysis shows, in general a dependence on n is unavoidable without further changes. To understand how RandomShuffle works in practice, we further explore two empirically useful settings: data sparsity and over-parameterization. For sparse data, RandomShuffle has the rate O(1/T^2), again strictly better than SGD. Under a setting closely related to over-parameterization, RandomShuffle is shown to converge faster than SGD after any arbitrary number of iterations. Finally, we extend the analysis of RandomShuffle to smooth non-convex and convex functions.