PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization
Songtao Lu · Mingyi Hong · Zhengdao Wang

Tue Jun 11th 11:00 -- 11:20 AM @ Room 104

Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. %, in which a gradient step is taken on one block, while keeping the remaining block fixed. In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a {\bf p}erturbed {\bf A}-{\bf GD} (PA-GD) which is able to converge (with high probability) to the second-order stationary points (SOSPs) with a global sublinear rate. {Existing analysis on A-GD type algorithm either only guarantees convergence to first-order solutions, or converges to second-order solutions asymptotically (without rates).} To the best of our knowledge, this is the first alternating type algorithm that takes $\mathcal{O}(\text{polylog}(d)/\epsilon^2)$ iterations to achieve an ($\epsilon,\sqrt{\epsilon}$)-SOSP with high probability, where polylog$(d)$ denotes the polynomial of the logarithm with respect to problem dimension $d$.

Author Information

Songtao Lu (University of Minnesota Twin Cities)
Mingyi Hong (University of Minnesota)
Zhengdao Wang (Iowa State University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors