Timezone: »
We propose a general yet simple theorem describing the convergence of SGD under the arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of variants of SGD, each of which is associated with a specific probability law governing the data selection rule used to form minibatches. This is the first time such an analysis is performed, and most of our variants of SGD were never explicitly considered in the literature before. Our analysis relies on the recently introduced notion of expected smoothness and does not rely on a uniform bound on the variance of the stochastic gradients. By specializing our theorem to different minibatching strategies, such as sampling with replacement and independent sampling, we derive exact expressions for the stepsize as a function of the minibatch size. With this we can also determine the minibatch size that optimizes the total complexity, and show explicitly that as the variance of the stochastic gradient evaluated at the minimum grows, so does the optimal minibatch size. For zero variance, the optimal minibatch size is one. Moreover, we prove insightful stepsizeswitching rules which describe when one should switch from a constant to a decreasing stepsize regime.
Author Information
Robert Gower (Telecom Paristech)
https://gowerrobert.github.io/
Nicolas Loizou (The University of Edinburgh)
https://www.maths.ed.ac.uk/~s1461357/
Xun Qian (KAUST)
Alibek Sailanbayev (King Abdullah University of Science and Technology)
Egor Shulgin (Moscow Institute of Physics and Technology)
I am a PhD student in Computer Science at King Abdullah University of Science and Technology (KAUST) advised by [Petr Richtrik](https://richtarik.org/). Prior to that, I obtained BSc in Applied Mathematics, Computer Science, and Physics from Moscow Institute of Physics and Technology in 2019.
Peter Richtarik (KAUST)
Peter Richtarik is an Associate Professor of Computer Science and Mathematics at KAUST and an Associate Professor of Mathematics at the University of Edinburgh. He is an EPSRC Fellow in Mathematical Sciences, Fellow of the Alan Turing Institute, and is affiliated with the Visual Computing Center and the Extreme Computing Research Center at KAUST. Dr. Richtarik received his PhD from Cornell University in 2007, and then worked as a Postdoctoral Fellow in Louvain, Belgium, before joining Edinburgh in 2009, and KAUST in 2017. Dr. Richtarik's research interests lie at the intersection of mathematics, computer science, machine learning, optimization, numerical linear algebra, high performance computing and applied probability. Through his recent work on randomized decomposition algorithms (such as randomized coordinate descent methods, stochastic gradient descent methods and their numerous extensions, improvements and variants), he has contributed to the foundations of the emerging field of big data optimization, randomized numerical linear algebra, and stochastic methods for empirical risk minimization. Several of his papers attracted international awards, including the SIAM SIGEST Best Paper Award, the IMA Leslie Fox Prize (2nd prize, twice), and the INFORMS Computing Society Best Student Paper Award (sole runner up). He is the founder and organizer of the Optimization and Big Data workshop series.
Related Events (a corresponding poster, oral, or spotlight)

2019 Oral: SGD: General Analysis and Improved Rates »
Tue. Jun 11th 09:40  10:00 PM Room Room 103
More from the Same Authors

2021 : EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback »
Peter Richtarik · Peter Richtarik · Ilyas Fatkhullin 
2021 : A general sample complexity analysis of vanilla policy gradient »
Rui Yuan · Robert Gower · Alessandro Lazaric 
2023 Poster: A Model Based Method for Minimizing CVaR and Beyond »
Si Yi Meng · Robert Gower 
2023 Poster: HighProbability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance »
Abdurakhmon Sadiev · Marina Danilova · Eduard Gorbunov · Samuel Horváth · Gauthier Gidel · Pavel Dvurechenskii · Alexander Gasnikov · Peter Richtarik 
2023 Poster: EF21P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression »
Kaja Gruntkowska · Alexander Tyurin · Peter Richtarik 
2022 Poster: Proximal and Federated Random Reshuffling »
Konstantin Mishchenko · Ahmed Khaled · Peter Richtarik 
2022 Poster: A Convergence Theory for SVGD in the Population Limit under Talagrand's Inequality T1 »
Adil Salim · Lukang Sun · Peter Richtarik 
2022 Poster: 3PC: Three Point Compressors for CommunicationEfficient Distributed Training and a Better Theory for Lazy Aggregation »
Peter Richtarik · Igor Sokolov · Elnur Gasanov · Ilyas Fatkhullin · Zhize Li · Eduard Gorbunov 
2022 Spotlight: A Convergence Theory for SVGD in the Population Limit under Talagrand's Inequality T1 »
Adil Salim · Lukang Sun · Peter Richtarik 
2022 Spotlight: Proximal and Federated Random Reshuffling »
Konstantin Mishchenko · Ahmed Khaled · Peter Richtarik 
2022 Spotlight: 3PC: Three Point Compressors for CommunicationEfficient Distributed Training and a Better Theory for Lazy Aggregation »
Peter Richtarik · Igor Sokolov · Elnur Gasanov · Ilyas Fatkhullin · Zhize Li · Eduard Gorbunov 
2022 Poster: ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! »
Konstantin Mishchenko · Grigory Malinovsky · Sebastian Stich · Peter Richtarik 
2022 Spotlight: ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! »
Konstantin Mishchenko · Grigory Malinovsky · Sebastian Stich · Peter Richtarik 
2022 Poster: FedNL: Making NewtonType Methods Applicable to Federated Learning »
Mher Safaryan · Rustem Islamov · Xun Qian · Peter Richtarik 
2022 Spotlight: FedNL: Making NewtonType Methods Applicable to Federated Learning »
Mher Safaryan · Rustem Islamov · Xun Qian · Peter Richtarik 
2021 : Closing Remarks »
Shiqiang Wang · Nathalie Baracaldo · Olivia Choudhury · Gauri Joshi · Peter Richtarik · Praneeth Vepakomma · Han Yu 
2021 : Opening Remarks »
Shiqiang Wang · Nathalie Baracaldo · Olivia Choudhury · Gauri Joshi · Peter Richtarik · Praneeth Vepakomma · Han Yu 
2021 Poster: ADOM: Accelerated Decentralized Optimization Method for TimeVarying Networks »
Dmitry Kovalev · Egor Shulgin · Peter Richtarik · Alexander Rogozin · Alexander Gasnikov 
2021 Spotlight: ADOM: Accelerated Decentralized Optimization Method for TimeVarying Networks »
Dmitry Kovalev · Egor Shulgin · Peter Richtarik · Alexander Rogozin · Alexander Gasnikov 
2021 Poster: MARINA: Faster NonConvex Distributed Learning with Compression »
Eduard Gorbunov · Konstantin Burlachenko · Zhize Li · Peter Richtarik 
2021 Spotlight: MARINA: Faster NonConvex Distributed Learning with Compression »
Eduard Gorbunov · Konstantin Burlachenko · Zhize Li · Peter Richtarik 
2021 Poster: PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization »
Zhize Li · Hongyan Bao · Xiangliang Zhang · Peter Richtarik 
2021 Poster: Stochastic Sign Descent Methods: New Algorithms and Better Theory »
Mher Safaryan · Peter Richtarik 
2021 Poster: Distributed Second Order Methods with Fast Rates and Compressed Communication »
Rustem Islamov · Xun Qian · Peter Richtarik 
2021 Spotlight: Distributed Second Order Methods with Fast Rates and Compressed Communication »
Rustem Islamov · Xun Qian · Peter Richtarik 
2021 Oral: PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization »
Zhize Li · Hongyan Bao · Xiangliang Zhang · Peter Richtarik 
2021 Spotlight: Stochastic Sign Descent Methods: New Algorithms and Better Theory »
Mher Safaryan · Peter Richtarik 
2020 Poster: Stochastic Subspace Cubic Newton Method »
Filip Hanzely · Nikita Doikov · Yurii Nesterov · Peter Richtarik 
2020 Poster: Variance Reduced Coordinate Descent with Acceleration: New Method With a Surprising Application to FiniteSum Problems »
Filip Hanzely · Dmitry Kovalev · Peter Richtarik 
2020 Poster: Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization »
Zhize Li · Dmitry Kovalev · Xun Qian · Peter Richtarik 
2020 Poster: From Local SGD to Local FixedPoint Methods for Federated Learning »
Grigory Malinovsky · Dmitry Kovalev · Elnur Gasanov · Laurent CONDAT · Peter Richtarik 
2019 Poster: Stochastic Gradient Push for Distributed Deep Learning »
Mahmoud Assran · Nicolas Loizou · Nicolas Ballas · Michael Rabbat 
2019 Oral: Stochastic Gradient Push for Distributed Deep Learning »
Mahmoud Assran · Nicolas Loizou · Nicolas Ballas · Michael Rabbat 
2019 Poster: Nonconvex Variance Reduced Optimization with Arbitrary Sampling »
Samuel Horvath · Peter Richtarik 
2019 Poster: SAGA with Arbitrary Sampling »
Xun Qian · Zheng Qu · Peter Richtarik 
2019 Oral: SAGA with Arbitrary Sampling »
Xun Qian · Zheng Qu · Peter Richtarik 
2019 Oral: Nonconvex Variance Reduced Optimization with Arbitrary Sampling »
Samuel Horvath · Peter Richtarik 
2018 Poster: SGD and Hogwild! Convergence Without the Bounded Gradients Assumption »
Lam Nguyen · PHUONG_HA NGUYEN · Marten van Dijk · Peter Richtarik · Katya Scheinberg · Martin Takac 
2018 Oral: SGD and Hogwild! Convergence Without the Bounded Gradients Assumption »
Lam Nguyen · PHUONG_HA NGUYEN · Marten van Dijk · Peter Richtarik · Katya Scheinberg · Martin Takac