Skip to yearly menu bar Skip to main content


Poster

Double Stochasticity Gazes Faster: Snap-Shot Decentralized Stochastic Gradient Tracking Methods

Hao Di · Haishan Ye · Xiangyu Chang · Guang Dai · Ivor Tsang

Hall C 4-9 #2711

Abstract: In decentralized optimization, m agents form a network and only communicate with their neighbors, which gives advantages in data ownership, privacy, and scalability. At the same time, decentralized stochastic gradient descent (SGD) methods, as popular decentralized algorithms for training large-scale machine learning models, have shown their superiority over centralized counterparts. Distributed stochastic gradient tracking DSGT has been recognized as the popular and state-of-the-art decentralized SGD method due to its proper theoretical guarantees. However, the theoretical analysis of DSGT shows that its iteration complexity is O~(σ¯2mμε+Lσ¯μ(1λ2(W))1/2CWε), where the doubly stochastic matrix W represents the network topology and CW is a parameter that depends on W. Thus, it indicates that the convergence property of DSGT is heavily affected by the topology of the communication network. To overcome the weakness of DSGT, we resort to the snap-shot gradient tracking skill and propose two novel algorithms, snap-shot DSGT (SS-DSGT) and accelerated snap-shot DSGT (ASS-DSGT). We further justify that SS-DSGT exhibits a lower iteration complexity compared to DSGT in the general communication network topology. Additionally, ASS-DSGT matches DSGT's iteration complexity O(σ¯2mμε+Lσ¯μ(1λ2(W))1/2ε) under the same conditions as DSGT. Numerical experiments validate SS-DSGT's superior performance performance in the general communication network topology and exhibit better practical performance of ASS-DSGT on the specified W compared to DSGT.

Chat is not available.