Timezone: »

SGD: General Analysis and Improved Rates
Robert M. Gower · Nicolas Loizou · Xun Qian · Alibek Sailanbayev · Egor Shulgin · Peter Richtarik

Tue Jun 11 02:40 PM -- 03:00 PM (PDT) @ Room 103

We propose a general yet simple theorem describing the convergence of SGD under the arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of variants of SGD, each of which is associated with a specific probability law governing the data selection rule used to form minibatches. This is the first time such an analysis is performed, and most of our variants of SGD were never explicitly considered in the literature before. Our analysis relies on the recently introduced notion of expected smoothness and does not rely on a uniform bound on the variance of the stochastic gradients. By specializing our theorem to different mini-batching strategies, such as sampling with replacement and independent sampling, we derive exact expressions for the stepsize as a function of the mini-batch size. With this we can also determine the mini-batch size that optimizes the total complexity, and show explicitly that as the variance of the stochastic gradient evaluated at the minimum grows, so does the optimal mini-batch size. For zero variance, the optimal mini-batch size is one. Moreover, we prove insightful stepsize-switching rules which describe when one should switch from a constant to a decreasing stepsize regime.

Author Information

Robert M. Gower (Telecom Paristech)


Nicolas Loizou (The University of Edinburgh)


Xun Qian (KAUST)
Alibek Sailanbayev (King Abdullah University of Science and Technology)
Egor Shulgin (Moscow Institute of Physics and Technology)

I am a Master's student in Computer Science at King Abdullah University of Science and Technology (KAUST) advised by [Peter Richtárik](https://richtarik.org/). Prior to that, I obtained BSc in Applied Mathematics, Computer Science, and Physics from Moscow Institute of Physics and Technology in 2019.

Peter Richtarik (KAUST)

Peter Richtarik is an Associate Professor of Computer Science and Mathematics at KAUST and an Associate Professor of Mathematics at the University of Edinburgh. He is an EPSRC Fellow in Mathematical Sciences, Fellow of the Alan Turing Institute, and is affiliated with the Visual Computing Center and the Extreme Computing Research Center at KAUST. Dr. Richtarik received his PhD from Cornell University in 2007, and then worked as a Postdoctoral Fellow in Louvain, Belgium, before joining Edinburgh in 2009, and KAUST in 2017. Dr. Richtarik's research interests lie at the intersection of mathematics, computer science, machine learning, optimization, numerical linear algebra, high performance computing and applied probability. Through his recent work on randomized decomposition algorithms (such as randomized coordinate descent methods, stochastic gradient descent methods and their numerous extensions, improvements and variants), he has contributed to the foundations of the emerging field of big data optimization, randomized numerical linear algebra, and stochastic methods for empirical risk minimization. Several of his papers attracted international awards, including the SIAM SIGEST Best Paper Award, the IMA Leslie Fox Prize (2nd prize, twice), and the INFORMS Computing Society Best Student Paper Award (sole runner up). He is the founder and organizer of the Optimization and Big Data workshop series.​

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors