Sharper Generalization Guarantees for Asynchronous SGD: Beyond Lipschitzness, Smoothness and Data Homogeneity
Yufeng Xie ⋅ Yunwen Lei
Abstract
Asynchronous stochastic gradient descent (ASGD) is widely adopted in distributed and federated learning. In this paper, we develop a sharp generalization analysis for ASGD by leveraging the concept of on-average model stability. For convex and smooth objectives, we establish stability and excess risk bounds under minimal assumptions, removing Lipschitz continuity, bounded noise, bounded parameter or data domains, while allowing randomly partitioned data and arbitrary delays. Our bounds are optimistic and explicitly characterize the impact of worker participation, recovering the minimax-optimal rate $O(1/\sqrt{mn})$ in balanced regimes where $mn$ denotes the sample size and implying fast rates under low-noise conditions. We further extend the analysis to non-smooth objectives with Hölder-continuous gradients and to heterogeneous data settings via random ASGD, obtaining non-vacuous excess risk guarantees in both settings. Experimental results support our theoretical findings.
Successful Page Load