ICML Poster Constant Stepsize Local GD for Logistic Regression: Acceleration by Instability

Poster

Constant Stepsize Local GD for Logistic Regression: Acceleration by Instability

Michael Crawshaw · Blake Woodworth · Mingrui Liu

West Exhibition Hall B2-B3 #W-614

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: Existing analysis of Local (Stochastic) Gradient Descent for heterogeneous objectives requires stepsizes $\eta \leq 1/K$ where $K$ is the communication interval, which ensures monotonic decrease of the objective. In contrast, we analyze Local Gradient Descent for logistic regression with separable, heterogeneous data using any stepsize $\eta > 0$. With $R$ communication rounds and $M$ clients, we show convergence at a rate $\mathcal{O}(1/\eta K R)$ after an initial unstable phase lasting for $\widetilde{\mathcal{O}}(\eta K M)$ rounds. This improves upon the existing $\mathcal{O}(1/R)$ rate for general smooth, convex objectives. Our analysis parallels the single machine analysis of Wu et al. (2024) in which instability is caused by extremely large stepsizes, but in our setting another source of instability is large local updates with heterogeneous objectives.

Lay Summary:

Machine learning is generally very powerful, but also very expensive in terms of resources such as computer power and data. To mitigate these resource requirements, it is common to train machine learning models in parallel across many devices, such as mobile phones, which helps by leveraging compute power and data from many users. In this paper, we study a classic algorithm for training machine learning models in this distributed manner, and prove that this algorithm can train certain machine learning models much faster than previously understood. Essentially, this acceleration comes from allowing the algorithm to be "unstable", which is usually considered inadvisable, but in our case the instability actually creates acceleration.

Chat is not available.