Timezone: »
We propose and study a new class of gradient compressors for communication-efficient training---three point compressors (3PC)---as well as efficient distributed nonconvex optimization algorithms that can take advantage of them. Unlike most established approaches, which rely on a static compressor choice (e.g., TopK), our class allows the compressors to {\em evolve} throughout the training process, with the aim of improving the theoretical communication complexity and practical efficiency of the underlying methods. We show that our general approach can recover the recently proposed state-of-the-art error feedback mechanism EF21 (Richt\'{a}rik et al, 2021) and its theoretical properties as a special case, but also leads to a number of new efficient methods. Notably, our approach allows us to improve upon the state-of-the-art in the algorithmic and theoretical foundations of the {\em lazy aggregation} literature (Liu et al, 2017; Lan et al, 2017). As a by-product that may be of independent interest, we provide a new and fundamental link between the lazy aggregation and error feedback literature. A special feature of our work is that we do not require the compressors to be unbiased.
Author Information
Peter Richtarik (KAUST)
Peter Richtarik is an Associate Professor of Computer Science and Mathematics at KAUST and an Associate Professor of Mathematics at the University of Edinburgh. He is an EPSRC Fellow in Mathematical Sciences, Fellow of the Alan Turing Institute, and is affiliated with the Visual Computing Center and the Extreme Computing Research Center at KAUST. Dr. Richtarik received his PhD from Cornell University in 2007, and then worked as a Postdoctoral Fellow in Louvain, Belgium, before joining Edinburgh in 2009, and KAUST in 2017. Dr. Richtarik's research interests lie at the intersection of mathematics, computer science, machine learning, optimization, numerical linear algebra, high performance computing and applied probability. Through his recent work on randomized decomposition algorithms (such as randomized coordinate descent methods, stochastic gradient descent methods and their numerous extensions, improvements and variants), he has contributed to the foundations of the emerging field of big data optimization, randomized numerical linear algebra, and stochastic methods for empirical risk minimization. Several of his papers attracted international awards, including the SIAM SIGEST Best Paper Award, the IMA Leslie Fox Prize (2nd prize, twice), and the INFORMS Computing Society Best Student Paper Award (sole runner up). He is the founder and organizer of the Optimization and Big Data workshop series.
Igor Sokolov (King Abdullah University of Science and Technology)
Elnur Gasanov (KAUST)
Ilyas Fatkhullin (ETH Zurich)
Zhize Li (Carnegie Mellon University)
Eduard Gorbunov (Moscow Institute of Physics and Technology)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: 3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation »
Thu. Jul 21st 06:55 -- 07:00 PM Room Ballroom 3 & 4
More from the Same Authors
-
2021 : FedMix: A Simple and Communication-Efficient Alternative to Local Methods in Federated Learning »
Elnur Gasanov -
2021 : EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback »
Peter Richtarik · Peter Richtarik · Ilyas Fatkhullin -
2023 : Improving Accelerated Federated Learning with Compression and Importance Sampling »
Michał Grudzień · Grigory Malinovsky · Peter Richtarik -
2023 : Federated Learning with Regularized Client Participation »
Grigory Malinovsky · Samuel Horváth · Konstantin Burlachenko · Peter Richtarik -
2023 : Federated Optimization Algorithms with Random Reshuffling and Gradient Compression »
Abdurakhmon Sadiev · Grigory Malinovsky · Eduard Gorbunov · Igor Sokolov · Ahmed Khaled · Konstantin Burlachenko · Peter Richtarik -
2023 : Momentum Provably Improves Error Feedback! »
Ilyas Fatkhullin · Alexander Tyurin · Peter Richtarik -
2023 : ELF: Federated Langevin Algorithms with Primal, Dual and Bidirectional Compression »
Avetik Karagulyan · Peter Richtarik -
2023 : Towards a Better Theoretical Understanding of Independent Subnetwork Training »
Egor Shulgin · Peter Richtarik -
2023 : Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes »
Konstantin Mishchenko · Slavomír Hanzely · Peter Richtarik -
2023 Poster: High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance »
Abdurakhmon Sadiev · Marina Danilova · Eduard Gorbunov · Samuel Horváth · Gauthier Gidel · Pavel Dvurechenskii · Alexander Gasnikov · Peter Richtarik -
2023 Poster: Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space »
Anas Barakat · Ilyas Fatkhullin · Niao He -
2023 Poster: Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies »
Ilyas Fatkhullin · Anas Barakat · Anastasia Kireeva · Niao He -
2023 Poster: EF21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression »
Kaja Gruntkowska · Alexander Tyurin · Peter Richtarik -
2022 Poster: Proximal and Federated Random Reshuffling »
Konstantin Mishchenko · Ahmed Khaled · Peter Richtarik -
2022 Poster: Secure Distributed Training at Scale »
Eduard Gorbunov · Alexander Borzunov · Michael Diskin · Max Ryabinin -
2022 Poster: A Convergence Theory for SVGD in the Population Limit under Talagrand's Inequality T1 »
Adil Salim · Lukang Sun · Peter Richtarik -
2022 Spotlight: A Convergence Theory for SVGD in the Population Limit under Talagrand's Inequality T1 »
Adil Salim · Lukang Sun · Peter Richtarik -
2022 Spotlight: Proximal and Federated Random Reshuffling »
Konstantin Mishchenko · Ahmed Khaled · Peter Richtarik -
2022 Spotlight: Secure Distributed Training at Scale »
Eduard Gorbunov · Alexander Borzunov · Michael Diskin · Max Ryabinin -
2022 Poster: ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! »
Konstantin Mishchenko · Grigory Malinovsky · Sebastian Stich · Peter Richtarik -
2022 Spotlight: ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! »
Konstantin Mishchenko · Grigory Malinovsky · Sebastian Stich · Peter Richtarik -
2022 Poster: FedNL: Making Newton-Type Methods Applicable to Federated Learning »
Mher Safaryan · Rustem Islamov · Xun Qian · Peter Richtarik -
2022 Spotlight: FedNL: Making Newton-Type Methods Applicable to Federated Learning »
Mher Safaryan · Rustem Islamov · Xun Qian · Peter Richtarik -
2021 : Closing Remarks »
Shiqiang Wang · Nathalie Baracaldo · Olivia Choudhury · Gauri Joshi · Peter Richtarik · Praneeth Vepakomma · Han Yu -
2021 : Opening Remarks »
Shiqiang Wang · Nathalie Baracaldo · Olivia Choudhury · Gauri Joshi · Peter Richtarik · Praneeth Vepakomma · Han Yu -
2021 Poster: ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks »
Dmitry Kovalev · Egor Shulgin · Peter Richtarik · Alexander Rogozin · Alexander Gasnikov -
2021 Spotlight: ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks »
Dmitry Kovalev · Egor Shulgin · Peter Richtarik · Alexander Rogozin · Alexander Gasnikov -
2021 Poster: MARINA: Faster Non-Convex Distributed Learning with Compression »
Eduard Gorbunov · Konstantin Burlachenko · Zhize Li · Peter Richtarik -
2021 Spotlight: MARINA: Faster Non-Convex Distributed Learning with Compression »
Eduard Gorbunov · Konstantin Burlachenko · Zhize Li · Peter Richtarik -
2021 Poster: PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization »
Zhize Li · Hongyan Bao · Xiangliang Zhang · Peter Richtarik -
2021 Poster: Stochastic Sign Descent Methods: New Algorithms and Better Theory »
Mher Safaryan · Peter Richtarik -
2021 Poster: Distributed Second Order Methods with Fast Rates and Compressed Communication »
Rustem Islamov · Xun Qian · Peter Richtarik -
2021 Spotlight: Distributed Second Order Methods with Fast Rates and Compressed Communication »
Rustem Islamov · Xun Qian · Peter Richtarik -
2021 Oral: PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization »
Zhize Li · Hongyan Bao · Xiangliang Zhang · Peter Richtarik -
2021 Spotlight: Stochastic Sign Descent Methods: New Algorithms and Better Theory »
Mher Safaryan · Peter Richtarik -
2020 Poster: Stochastic Subspace Cubic Newton Method »
Filip Hanzely · Nikita Doikov · Yurii Nesterov · Peter Richtarik -
2020 Poster: Variance Reduced Coordinate Descent with Acceleration: New Method With a Surprising Application to Finite-Sum Problems »
Filip Hanzely · Dmitry Kovalev · Peter Richtarik -
2020 Poster: Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization »
Zhize Li · Dmitry Kovalev · Xun Qian · Peter Richtarik -
2020 Poster: From Local SGD to Local Fixed-Point Methods for Federated Learning »
Grigory Malinovsky · Dmitry Kovalev · Elnur Gasanov · Laurent CONDAT · Peter Richtarik -
2019 Poster: Nonconvex Variance Reduced Optimization with Arbitrary Sampling »
Samuel Horvath · Peter Richtarik -
2019 Poster: SAGA with Arbitrary Sampling »
Xun Qian · Zheng Qu · Peter Richtarik -
2019 Poster: SGD: General Analysis and Improved Rates »
Robert Gower · Nicolas Loizou · Xun Qian · Alibek Sailanbayev · Egor Shulgin · Peter Richtarik -
2019 Oral: SAGA with Arbitrary Sampling »
Xun Qian · Zheng Qu · Peter Richtarik -
2019 Oral: SGD: General Analysis and Improved Rates »
Robert Gower · Nicolas Loizou · Xun Qian · Alibek Sailanbayev · Egor Shulgin · Peter Richtarik -
2019 Oral: Nonconvex Variance Reduced Optimization with Arbitrary Sampling »
Samuel Horvath · Peter Richtarik -
2018 Poster: SGD and Hogwild! Convergence Without the Bounded Gradients Assumption »
Lam Nguyen · PHUONG_HA NGUYEN · Marten van Dijk · Peter Richtarik · Katya Scheinberg · Martin Takac -
2018 Oral: SGD and Hogwild! Convergence Without the Bounded Gradients Assumption »
Lam Nguyen · PHUONG_HA NGUYEN · Marten van Dijk · Peter Richtarik · Katya Scheinberg · Martin Takac