Error Feedback Fixes SignSGD and other Gradient Compression Schemes
Sai Praneeth Reddy Karimireddy · Quentin Rebjock · Sebastian Stich · Martin Jaggi

Tue Jun 11th 11:40 AM -- 12:00 PM @ Room 104

Sign-based algorithms (e.g. signSGD) have been proposed as a biased gradient compression technique to alleviate the communication bottleneck in training large neural networks across multiple workers. We show simple convex counter-examples where signSGD does not converge to the optimum. Further, even when it does converge, signSGD may generalize poorly when compared with SGD. These issues arise because of the biased nature of the sign compression operator.

Finally we show that using error-feedback, i.e. incorporating the error made by the compression operator into the next step, overcomes these issues. We prove that our algorithm, EF-SGD, with arbitrary compression operator, achieves the \emph{same rate of convergence} as SGD without any additional assumptions, indicating that we get gradient compression \emph{for free}. Our experiments thoroughly substantiate the theory showing the superiority of our algorithm.

Author Information

Praneeth Karimireddy (EPFL)
Quentin Rebjock (EPFL)
Sebastian Stich (EPFL)
Martin Jaggi (EPFL)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors