Timezone: »
Recent research on online Gradient Balancing (GraB) reveals that there exist permutation-based data example orders that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training data examples, GraB leverages information in stale example gradients from prior epochs to order examples for the next epoch --- achieving a provably faster convergence rate than RR. However, GraB is limited by design: While it demonstrates an impressive ability to scale-up training on \emph{centralized} data, it does not naturally extend to modern \emph{distributed} ML workloads. We therefore propose \emph{Coordinated Distributed GraB} (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms baselines empirically, including distributed RR, on a variety of benchmark tasks.
Author Information
A. Feder Cooper (Cornell University)
Wentao Guo (Cornell University)
I am a master of engineering student in CS at Cornell University. Previously I also obtained my bachelor degree in CS at Cornell University.
Duc Khiem Pham (Cornell University)
Tiancheng Yuan (Cornell University)
Charlie Ruan (Cornell University)
Yucheng Lu (Cornell University)
Chris De Sa (Cornell)
More from the Same Authors
-
2023 Poster: InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models »
Yingheng Wang · Yair Schiff · Aaron Gokaslan · Weishen Pan · Fei Wang · Chris De Sa · Volodymyr Kuleshov -
2023 Poster: CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks »
Jue Wang · Yucheng Lu · Binhang Yuan · Beidi Chen · Percy Liang · Chris De Sa · Christopher Re · Ce Zhang -
2023 Poster: STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition »
Yucheng Lu · Shivani Agrawal · Suvinay Subramanian · Oleg Rybakov · Chris De Sa · Amir Yazdanbakhsh -
2022 : MCTensor: A High-Precision Deep Learning Library with Multi-Component Floating-Point »
Tao Yu · Wentao Guo · Canal Li · Tiancheng Yuan · Christopher De Sa -
2021 Poster: Variance Reduced Training with Stratified Sampling for Forecasting Models »
Yucheng Lu · Youngsuk Park · Lifan Chen · Yuyang Wang · Christopher De Sa · Dean Foster -
2021 Spotlight: Variance Reduced Training with Stratified Sampling for Forecasting Models »
Yucheng Lu · Youngsuk Park · Lifan Chen · Yuyang Wang · Christopher De Sa · Dean Foster -
2021 Poster: Optimal Complexity in Decentralized Training »
Yucheng Lu · Christopher De Sa -
2021 Oral: Optimal Complexity in Decentralized Training »
Yucheng Lu · Christopher De Sa -
2020 Poster: Moniqua: Modulo Quantized Communication in Decentralized SGD »
Yucheng Lu · Christopher De Sa