Abstract:
In decentralized optimization, agents form a network and only communicate with their neighbors, which gives advantages in data ownership, privacy, and scalability. At the same time, decentralized stochastic gradient descent () methods, as popular decentralized algorithms for training large-scale machine learning models, have shown their superiority over centralized counterparts. Distributed stochastic gradient tracking has been recognized as the popular and state-of-the-art decentralized method due to its proper theoretical guarantees. However, the theoretical analysis of shows that its iteration complexity is , where the doubly stochastic matrix represents the network topology and is a parameter that depends on . Thus, it indicates that the convergence property of is heavily affected by the topology of the communication network. To overcome the weakness of , we resort to the snap-shot gradient tracking skill and propose two novel algorithms, snap-shot () and accelerated snap-shot (). We further justify that exhibits a lower iteration complexity compared to in the general communication network topology. Additionally, matches 's iteration complexity under the same conditions as . Numerical experiments validate 's superior performance performance in the general communication network topology and exhibit better practical performance of on the specified compared to .
Chat is not available.