Timezone: »
Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across locations/devices. Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization layers; and (iii) the degree of skewness is a key determinant of the difficulty of the problem. Based on these findings, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the skew-induced accuracy loss of batch normalization.
Author Information
Kevin Hsieh (Microsoft Research)
Amar Phanishayee (Microsoft Research)
Onur Mutlu (ETH Zurich)
Phillip Gibbons (CMU)
More from the Same Authors
-
2021 Poster: Memory-Efficient Pipeline-Parallel DNN Training »
Deepak Narayanan · Amar Phanishayee · Kaiyu Shi · Xie Chen · Matei Zaharia -
2021 Spotlight: Memory-Efficient Pipeline-Parallel DNN Training »
Deepak Narayanan · Amar Phanishayee · Kaiyu Shi · Xie Chen · Matei Zaharia -
2021 Poster: DriftSurf: Stable-State / Reactive-State Learning under Concept Drift »
Ashraf Tahmasbi · Ellango Jothimurugesan · Srikanta Tirthapura · Phillip Gibbons -
2021 Spotlight: DriftSurf: Stable-State / Reactive-State Learning under Concept Drift »
Ashraf Tahmasbi · Ellango Jothimurugesan · Srikanta Tirthapura · Phillip Gibbons -
2021 Poster: Boosting the Throughput and Accelerator Utilization of Specialized CNN Inference Beyond Increasing Batch Size »
Jack Kosaian · Amar Phanishayee · Matthai Philipose · Debadeepta Dey · Rashmi Vinayak -
2021 Spotlight: Boosting the Throughput and Accelerator Utilization of Specialized CNN Inference Beyond Increasing Batch Size »
Jack Kosaian · Amar Phanishayee · Matthai Philipose · Debadeepta Dey · Rashmi Vinayak