Oral
First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems
Ching-pei Lee · Stephen Wright
Abstract:
It has been known for many years that both gradient descent and
stochastic coordinate descent achieve a global convergence rate of
$O(1/k)$ in the objective value, when applied to a scheme for
minimizing a Lipschitz-continuously differentiable, unconstrained
convex function. In this work, we improve this rate to $o(1/k)$.
We extend the result to proximal gradient and proximal coordinate
descent on regularized problems to show similar $o(1/k)$ convergence
rates. The result is tight in the sense that an
$O(1/k^{1+\epsilon})$ rate is not generally attainable for any
$\epsilon>0$, for any of these methods.
Chat is not available.
Successful Page Load