Timezone: »

Compressed Decentralized Proximal Stochastic Gradient Method for Nonconvex Composite Problems with Heterogeneous Data
Yonggui Yan · Jie Chen · Pin-Yu Chen · Xiaodong Cui · Songtao Lu · Yangyang Xu

Thu Jul 27 01:30 PM -- 03:00 PM (PDT) @ Exhibit Hall 1 #633
We first propose a decentralized proximal stochastic gradient tracking method (DProxSGT) for nonconvex stochastic composite problems, with data heterogeneously distributed on multiple workers in a decentralized connected network. To save communication cost, we then extend DProxSGT to a compressed method by compressing the communicated information. Both methods need only $\mathcal{O}(1)$ samples per worker for each proximal update, which is important to achieve good generalization performance on training deep neural networks. With a smoothness condition on the expected loss function (but not on each sample function), the proposed methods can achieve an optimal sample complexity result to produce a near-stationary point. Numerical experiments on training neural networks demonstrate the significantly better generalization performance of our methods over large-batch training methods and momentum variance-reduction methods and also, the ability of handling heterogeneous data by the gradient tracking scheme.

Author Information

Yonggui Yan
Jie Chen (MIT-IBM Watson AI Lab, IBM Research)
Pin-Yu Chen (IBM Research)
Xiaodong Cui
Songtao Lu (IBM Thomas J. Watson Research Center)
Yangyang Xu (Rensselaer Polytechnic Institute)

More from the Same Authors