Accelerated Dual Method for Distributed Optimization: An Inexact-Gradient View of Local Updates
Abstract
In distributed machine learning, efficiently training across multiple agents with heterogeneous data distributions remains a central challenge. We address the problem of stochastic, strongly convex distributed optimization by applying accelerated gradient ascent to the dual variables and multi-step stochastic gradient descent (SGD) to the primal variables in the Lagrangian formulation. This approach naturally enables local computation, as the inner SGD loops require no inter-agent communication. We prove that the method converges for any number of local updates, attaining the optimal communication complexity when local computation is sufficient. Our analysis builds on an inexact accelerated gradient framework, where the partial gradient of the Lagrangian with respect to the dual variables is treated as an inexact gradient of the dual function. A notable byproduct of this framework is an algorithm that achieves optimal reproducibility guarantees under biased gradient estimates.