Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization
Farzin Haddadpour · Mohammad Mahdi Kamani · Mehrdad Mahdavi · Viveck Cadambe

Wed Jun 12th 11:35 -- 11:40 AM @ Room 103

The communication overhead is one of the key challenges that hinders the scalability of distributed optimization algorithms to train large neural networks. In recent years, there has been a great deal of research to alleviate communication cost by compressing the gradient vector or using local updates and periodic model averaging. In this paper, we aim at developing communication-efficient distributed stochastic algorithms for non-convex optimization by effective data replication strategies. In particular, we, both theoretically and practically, show that by properly infusing redundancy to the training data with model averaging, it is possible to significantly reduce the number of communications rounds. To be more precise, for a predetermined level of redundancy, the proposed algorithm samples min-batches from redundant chunks of data from multiple workers in updating local solutions. As a byproduct, we also show that the proposed algorithm is robust to failures. Our empirical studies on CIFAR10 and CIFAR100 datasets in a distributed environment complement our theoretical results.

Author Information

Farzin Haddadpour (Pennsylvania State University)
Mohammad Mahdi Kamani (The Pennsylvania State University)
Mehrdad Mahdavi (Pennsylvania State University)
Viveck Cadambe (Pennsylvania State University)

Related Events (a corresponding poster, oral, or spotlight)