ICML Poster On the Complexity of Finite-Sum Smooth Optimization under the Polyak

Spotlight Poster

On the Complexity of Finite-Sum Smooth Optimization under the Polyak–Łojasiewicz Condition

Yunyan Bai · Yuxing Liu · Luo Luo

Hall C 4-9 #2607

[ Abstract ] [ Paper PDF ]

Abstract: This paper considers the optimization problem of the form

min_{x \in R^{d}} f (x) ≜ \frac{1}{n} \sum_{i = 1}^{n} f_{i} (x)

$\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{n}\sum_{i=1}^n f_i({\bf x})$ , where

f (\cdot)

$f(\cdot)$ satisfies the Polyak–Łojasiewicz (PL) condition with parameter

μ

$\mu$ and

{f_{i} (\cdot)}_{i = 1}^{n}

$\{f_i(\cdot)\}_{i=1}^n$ is

L

$L$ -mean-squared smooth. We show that any gradient method requires at least

Ω (n + κ \sqrt{n} \log (1 / ϵ))

$\Omega(n+\kappa\sqrt{n}\log(1/\epsilon))$ incremental first-order oracle (IFO) calls to find an

ϵ

$\epsilon$ -suboptimal solution, where

κ ≜ L / μ

$\kappa\triangleq L/\mu$ is the condition number of the problem. This result nearly matches upper bounds of IFO complexity for best-known first-order methods. We also study the problem of minimizing the PL function in the distributed setting such that the individuals

f_{1} (\cdot), \dots, f_{n} (\cdot)

$f_1(\cdot),\dots,f_n(\cdot)$ are located on a connected network of

n

$n$ agents. We provide lower bounds of

Ω (κ / \sqrt{γ} \log (1 / ϵ))

$\Omega(\kappa/\sqrt{\gamma}\log(1/\epsilon))$ ,

Ω ((κ + τ κ / \sqrt{γ}) \log (1 / ϵ))

$\Omega((\kappa+\tau\kappa/\sqrt{\gamma})\log(1/\epsilon))$ and

Ω (n + κ \sqrt{n} \log (1 / ϵ))

$\Omega\big(n+\kappa\sqrt{n}\log(1/\epsilon)\big)$ for communication rounds, time cost and local first-order oracle calls respectively, where

γ \in (0, 1]

$\gamma\in(0,1]$ is the spectral gap of the mixing matrix associated with the network and

τ > 0

$\tau>0$ is the time cost of per communication round. Furthermore, we propose a decentralized first-order method that nearly matches above lower bounds in expectation.

Chat is not available.