Timezone: »
Poster
On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization
Hao Yu · rong jin
For SGD based distributed stochastic optimization, computation complexity, measured by the convergence rate in terms of the number of stochastic gradient calls, and communication complexity, measured by the number of inter-node communication rounds, are two most important performance metrics. The classical data-parallel implementation of SGD over N workers can achieve linear speedup of its convergence rate but incurs an inter-node communication round at each batch. We study the benefit of using dynamically increasing batch sizes in parallel SGD for stochastic non-convex optimization by charactering the attained convergence rate and the required number of communication rounds. We show that for stochastic non-convex optimization under the P-L condition, the classical data-parallel SGD with exponentially increasing batch sizes can achieve the fastest known $O(1/(NT))$ convergence with linear speedup using only $\log(T)$ communication rounds. For general stochastic non-convex optimization, we propose a Catalyst-like algorithm to achieve the fastest known $O(1/\sqrt{NT})$ convergence with only $O(\sqrt{NT}\log(\frac{T}{N}))$ communication rounds.
Author Information
Hao Yu (Alibaba Group (US) Inc)
rong jin (alibaba group)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization »
Wed. Jun 12th 09:25 -- 09:30 PM Room Room 104
More from the Same Authors
-
2022 Poster: MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for Efficient Object Detection »
Zhenhong Sun · Ming Lin · Xiuyu Sun · Zhiyu Tan · Hao Li · rong jin -
2022 Spotlight: MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for Efficient Object Detection »
Zhenhong Sun · Ming Lin · Xiuyu Sun · Zhiyu Tan · Hao Li · rong jin -
2022 Poster: FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting »
Tian Zhou · Ziqing MA · Qingsong Wen · Xue Wang · Liang Sun · rong jin -
2022 Spotlight: FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting »
Tian Zhou · Ziqing MA · Qingsong Wen · Xue Wang · Liang Sun · rong jin -
2021 Poster: Dash: Semi-Supervised Learning with Dynamic Thresholding »
Yi Xu · Lei Shang · Jinxing Ye · Qi Qian · Yu-Feng Li · Baigui Sun · Hao Li · rong jin -
2021 Oral: Dash: Semi-Supervised Learning with Dynamic Thresholding »
Yi Xu · Lei Shang · Jinxing Ye · Qi Qian · Yu-Feng Li · Baigui Sun · Hao Li · rong jin -
2019 Poster: On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization »
Hao Yu · rong jin · Sen Yang -
2019 Poster: Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence »
Yi Xu · Qi Qi · Qihang Lin · rong jin · Tianbao Yang -
2019 Oral: Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence »
Yi Xu · Qi Qi · Qihang Lin · rong jin · Tianbao Yang -
2019 Oral: On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization »
Hao Yu · rong jin · Sen Yang -
2018 Poster: Dynamic Regret of Strongly Adaptive Methods »
Lijun Zhang · Tianbao Yang · rong jin · Zhi-Hua Zhou -
2018 Oral: Dynamic Regret of Strongly Adaptive Methods »
Lijun Zhang · Tianbao Yang · rong jin · Zhi-Hua Zhou