Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance
Cong Xie · Sanmi Koyejo · Indranil Gupta

Wed Jun 12th 12:05 -- 12:10 PM @ Room 102

We present Zeno, a technique to make distributed machine learning, particularly Stochastic Gradient Descent (SGD), tolerant to an arbitrary number of faulty workers. This generalizes previous results that assumed a majority of non-faulty nodes; we need assume only one non-faulty worker. Our key idea is to suspect workers that are potentially defective. Since this is likely to lead to false positives, we use a ranking-based preference mechanism. We prove the convergence of SGD for non-convex problems under these scenarios. Experimental results show that Zeno outperforms existing approaches.

Author Information

Cong Xie (UIUC)
Sanmi Koyejo (Illinois / Google)

Sanmi (Oluwasanmi) Koyejo an Assistant Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Koyejo's research interests are in the development and analysis of probabilistic and statistical machine learning techniques motivated by, and applied to various modern big data problems. He is particularly interested in the analysis of large scale neuroimaging data. Koyejo completed his Ph.D in Electrical Engineering at the University of Texas at Austin advised by Joydeep Ghosh, and completed postdoctoral research at Stanford University with a focus on developing Machine learning techniques for neuroimaging data. His postdoctoral research was primarily with Russell A. Poldrack and Pradeep Ravikumar. Koyejo has been the recipient of several awards including the outstanding NCE/ECE student award, a best student paper award from the conference on uncertainty in artificial intelligence (UAI) and a trainee award from the Organization for Human Brain Mapping (OHBM).

Indranil Gupta (UIUC)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors