Timezone: »
Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i.e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS). To guarantee some form of robustness, recent work suggests using variants of the geometric median as an aggregation rule, in place of gradient averaging. Unfortunately, median-based rules can incur a prohibitive computational overhead in large-scale settings, and their convergence guarantees often require strong assumptions. In this work, we present DRACO, a scalable framework for robust distributed training that uses ideas from coding theory. In DRACO, each compute node evaluates redundant gradients that are used by the parameter server to eliminate the effects of adversarial updates. DRACO comes with problem-independent robustness guarantees, and the model that it trains is identical to the one trained in the adversary-free setup. We provide extensive experiments on real datasets and distributed setups across a variety of large-scale models, where we show that DRACO is several times, to orders of magnitude faster than median-based approaches.
Author Information
Lingjiao Chen (University of Wisconsin-Madison)
Hongyi Wang (University of Wisconsin-Madison)
I’m currently a second-year Ph.D. student at Computer Sciences Department of University of Wisconsin - Madison, advised by Prof. Dimitris Papailiopoulos. My research interests locate in machine learning, distributed system, and large-scale optimization.
Zachary Charles (University of Wisconsin-Madison)
Dimitris Papailiopoulos (ECE at University of Wisconsin-Madison)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: DRACO: Byzantine-resilient Distributed Training via Redundant Gradients »
Fri. Jul 13th 02:40 -- 02:50 PM Room A3
More from the Same Authors
-
2021 : Empirical Study on the Effective VC Dimension of Low-rank Neural Networks »
Daewon Seo · Hongyi Wang · Dimitris Papailiopoulos · Kangwook Lee -
2021 : Have the Cake and Eat It Too? Higher Accuracy and Less Expense when Using Multi-label ML APIs Online »
Lingjiao Chen · James Zou · Matei Zaharia -
2021 : Machine Learning API Shift Assessments: Change is Coming! »
Lingjiao Chen · James Zou · Matei Zaharia -
2023 : Teaching Arithmetic to Small Transformers »
Nayoung Lee · Kartik Sreenivasan · Jason Lee · Kangwook Lee · Dimitris Papailiopoulos -
2023 : Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding »
Seongjun Yang · Gibbeum Lee · Jaewoong Cho · Dimitris Papailiopoulos · Kangwook Lee -
2023 : Looped Transformers are Better at Learning Learning Algorithms »
Liu Yang · Kangwook Lee · Robert Nowak · Dimitris Papailiopoulos -
2023 Poster: Looped Transformers as Programmable Computers »
Angeliki Giannou · Shashank Rajput · Jy-yong Sohn · Kangwook Lee · Jason Lee · Dimitris Papailiopoulos -
2023 Poster: Transformers as Algorithms: Generalization and Stability in In-context Learning »
Yingcong Li · Muhammed Ildiz · Dimitris Papailiopoulos · Samet Oymak -
2022 Poster: GenLabel: Mixup Relabeling using Generative Models »
Jy yong Sohn · Liang Shang · Hongxu Chen · Jaekyun Moon · Dimitris Papailiopoulos · Kangwook Lee -
2022 Poster: Efficient Online ML API Selection for Multi-Label Classification Tasks »
Lingjiao Chen · Matei Zaharia · James Zou -
2022 Spotlight: Efficient Online ML API Selection for Multi-Label Classification Tasks »
Lingjiao Chen · Matei Zaharia · James Zou -
2022 Spotlight: GenLabel: Mixup Relabeling using Generative Models »
Jy yong Sohn · Liang Shang · Hongxu Chen · Jaekyun Moon · Dimitris Papailiopoulos · Kangwook Lee -
2021 : Dreaming of Federated Robustness: Inherent Barriers and Unavoidable Tradeoffs »
Dimitris Papailiopoulos -
2020 Poster: Closing the convergence gap of SGD without replacement »
Shashank Rajput · Anant Gupta · Dimitris Papailiopoulos -
2019 : Poster Session I »
Stark Draper · Mehmet Aktas · Basak Guler · Hongyi Wang · Venkata Gandikota · Hyegyeong Park · Jinhyun So · Lev Tauz · hema venkata krishna giri Narra · Zhifeng Lin · Mohammadali Maddahali · Yaoqing Yang · Sanghamitra Dutta · Amirhossein Reisizadeh · Jianyu Wang · Eren Balevi · Siddharth Jain · Paul McVay · Michael Rudow · Pedro Soto · Jun Li · Adarsh Subramaniam · Umut Demirhan · Vipul Gupta · Deniz Oktay · Leighton P Barnes · Johannes Ballé · Farzin Haddadpour · Haewon Jeong · Rong-Rong Chen · Mohammad Fahim -
2019 Workshop: Coding Theory For Large-scale Machine Learning »
Viveck Cadambe · Pulkit Grover · Dimitris Papailiopoulos · Gauri Joshi -
2019 Poster: Does Data Augmentation Lead to Positive Margin? »
Shashank Rajput · Zhili Feng · Zachary Charles · Po-Ling Loh · Dimitris Papailiopoulos -
2019 Oral: Does Data Augmentation Lead to Positive Margin? »
Shashank Rajput · Zhili Feng · Zachary Charles · Po-Ling Loh · Dimitris Papailiopoulos -
2018 Poster: Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training »
Xi Wu · Wooyeong Jang · Jiefeng Chen · Lingjiao Chen · Somesh Jha -
2018 Oral: Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training »
Xi Wu · Wooyeong Jang · Jiefeng Chen · Lingjiao Chen · Somesh Jha -
2018 Poster: Stability and Generalization of Learning Algorithms that Converge to Global Optima »
Zachary Charles · Dimitris Papailiopoulos -
2018 Oral: Stability and Generalization of Learning Algorithms that Converge to Global Optima »
Zachary Charles · Dimitris Papailiopoulos