Timezone: »

CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training
A. Feder Cooper · Wentao Guo · Duc Khiem Pham · Tiancheng Yuan · Charlie Ruan · Yucheng Lu · Chris De Sa

Recent research on online Gradient Balancing (GraB) reveals that there exist permutation-based data example orders that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training data examples, GraB leverages information in stale example gradients from prior epochs to order examples for the next epoch --- achieving a provably faster convergence rate than RR. However, GraB is limited by design: While it demonstrates an impressive ability to scale-up training on \emph{centralized} data, it does not naturally extend to modern \emph{distributed} ML workloads. We therefore propose \emph{Coordinated Distributed GraB} (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms baselines empirically, including distributed RR, on a variety of benchmark tasks.

Author Information

A. Feder Cooper (Cornell University)
Wentao Guo (Cornell University)
Wentao Guo

I am a master of engineering student in CS at Cornell University. Previously I also obtained my bachelor degree in CS at Cornell University.

Duc Khiem Pham (Cornell University)
Tiancheng Yuan (Cornell University)
Charlie Ruan (Cornell University)
Yucheng Lu (Cornell University)
Chris De Sa (Cornell)

More from the Same Authors