Timezone: »
Recent developments on large-scale distributed machine learning applications, e.g., deep neural networks, benefit enormously from the advances in distributed non-convex optimization techniques, e.g., distributed Stochastic Gradient Descent (SGD). A series of recent works study the linear speedup property of distributed SGD variants with reduced communication. The linear speedup property enables us to scale out the computing capability by adding more computing nodes into our system. The reduced communication complexity is desirable since communication overhead is often the performance bottleneck in distributed systems. Recently, momentum methods are more and more widely adopted by practitioners to train machine learning models since they can often converge faster and generalize better. However, it remains unclear whether any distributed momentum SGD possesses the same linear speedup property as distributed SGD and has reduced communication complexity. This paper fills the gap by considering a distributed communication efficient momentum SGD method and proving its linear speedup property.
Author Information
Hao Yu (Alibaba Group (US) Inc)
rong jin (alibaba group)
Sen Yang
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization »
Wed. Jun 12th 06:20 -- 06:25 PM Room Room 103
More from the Same Authors
-
2022 Poster: MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for Efficient Object Detection »
Zhenhong Sun · Ming Lin · Xiuyu Sun · Zhiyu Tan · Hao Li · rong jin -
2022 Spotlight: MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for Efficient Object Detection »
Zhenhong Sun · Ming Lin · Xiuyu Sun · Zhiyu Tan · Hao Li · rong jin -
2022 Poster: FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting »
Tian Zhou · Ziqing MA · Qingsong Wen · Xue Wang · Liang Sun · rong jin -
2022 Spotlight: FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting »
Tian Zhou · Ziqing MA · Qingsong Wen · Xue Wang · Liang Sun · rong jin -
2021 Poster: Dash: Semi-Supervised Learning with Dynamic Thresholding »
Yi Xu · Lei Shang · Jinxing Ye · Qi Qian · Yu-Feng Li · Baigui Sun · Hao Li · rong jin -
2021 Oral: Dash: Semi-Supervised Learning with Dynamic Thresholding »
Yi Xu · Lei Shang · Jinxing Ye · Qi Qian · Yu-Feng Li · Baigui Sun · Hao Li · rong jin -
2019 Poster: On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization »
Hao Yu · rong jin -
2019 Poster: Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence »
Yi Xu · Qi Qi · Qihang Lin · rong jin · Tianbao Yang -
2019 Oral: Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence »
Yi Xu · Qi Qi · Qihang Lin · rong jin · Tianbao Yang -
2019 Oral: On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization »
Hao Yu · rong jin -
2018 Poster: Dynamic Regret of Strongly Adaptive Methods »
Lijun Zhang · Tianbao Yang · rong jin · Zhi-Hua Zhou -
2018 Oral: Dynamic Regret of Strongly Adaptive Methods »
Lijun Zhang · Tianbao Yang · rong jin · Zhi-Hua Zhou