Oral
The Hidden Vulnerability of Distributed Learning in Byzantium
El Mahdi El Mhamdi · Rachid Guerraoui · Sébastien Rouault
Abstract:
While machine learning is going through an era of celebrated success,concerns have been raised about the vulnerability of its backbone: stochastic gradient descent (SGD).Recent approaches have been proposed to ensure the robustness of distributed SGD against adversarial (Byzantine) workers sending \emph{poisoned} gradients during the training phase.Some of these approaches have been proven \emph{Byzantine--resilient}: they ensure the \emph{convergence} of SGD despite the presence of a minority of adversarial workers.We show in this paper that \emph{convergence is not enough}.In high dimension $d \gg 1$,an adver\-sary can build on the loss function's non--convexity to make SGD converge to \emph{ineffective} models.More precisely, we bring to light that existing Byzantine--resilient schemes leave a \emph{margin of poisoning} of $\bigOmega\left(f(d)\right)$,where $f(d)$ increases at least like $\sqrt[p]{d~}$.Based on this \emph{leeway}, we build a simple attack,and experimentally show its strong to utmost effectivity on CIFAR--10 and MNIST.We introduce \emph{Bulyan}, and prove it significantly reduces the attackers leeway to a narrow $\bigO\,( \sfrac{1}{\sqrt{d~}})$ bound.We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and,at a reasonable cost in terms of required batch size,achieves convergence \emph{as if} only non--Byzantine gradients had been used to update the model.
Chat is not available.
Successful Page Load