Track: Optimization (Convex) 1

Wed 11 July 7:00 - 7:20 PDT

Dissipativity Theory for Accelerating Stochastic Variance Reduction: A Unified Analysis of SVRG and Katyusha Using Semidefinite Programs

Bin Hu · Stephen Wright · Laurent Lessard

Techniques for reducing the variance of gradient estimates used in stochastic programming algorithms for convex finite-sum problems have received a great deal of attention in recent years. By leveraging dissipativity theory from control, we provide a new perspective on two important variance-reduction algorithms: SVRG and its direct accelerated variant Katyusha. Our perspective provides a physically intuitive understanding of the behavior of SVRG-like methods via a principle of energy conservation. The tools discussed here allow us to automate the convergence analysis of SVRG-like methods by capturing their essential properties in small semidefinite programs amenable to standard analysis and computational techniques. Our approach recovers existing convergence results for SVRG and Katyusha and generalizes the theory to alternative parameter choices. We also discuss how our approach complements the linear coupling technique. Our combination of perspectives leads to a better understanding of accelerated variance-reduced stochastic methods for finite-sum problems.

Wed 11 July 7:20 - 7:30 PDT

Lyapunov Functions for First-Order Methods: Tight Automated Convergence Guarantees

Adrien Taylor · Bryan Van Scoy · Laurent Lessard

We present a novel way of generating Lyapunov functions for proving linear convergence rates of first-order optimization methods. Our approach provably obtains the fastest linear convergence rate that can be verified by a quadratic Lyapunov function (with given states), and only relies on solving a small-sized semidefinite program. Our approach combines the advantages of performance estimation problems (PEP, due to Drori and Teboulle (2014)) and integral quadratic constraints (IQC, due to Lessard et al. (2016)), and relies on convex interpolation (due to Taylor et al. (2017c;b)).

Wed 11 July 7:30 - 7:40 PDT

ADMM and Accelerated ADMM as Continuous Dynamical Systems

Guilherme Franca · Daniel Robinson · Rene Vidal

Recently, there has been an increasing interest in using tools from dynamicalsystems to analyze the behavior of simple optimization algorithms such as gradient descent and accelerated variants. This paper strengthens such connections by deriving the differential equations that model the continuous limit of the sequence of iterates generated by the alternating direction method of multipliers, as well as an accelerated variant. We employ the direct methodof Lyapunov to analyze the stability of critical points of the dynamical systems and to obtain associated convergence rates.

Wed 11 July 7:40 - 7:50 PDT

Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm

Pavel Dvurechenskii · Alexander Gasnikov · Alexey Kroshnin

We analyze two algorithms for approximating the general optimal transport (OT) distance between two discrete distributions of size $n$, up to accuracy $\varepsilon$. For the first algorithm, which is based on the celebrated Sinkhorn's algorithm, we prove the complexity bound $\widetilde{O}\left(\frac{n^2}{\varepsilon^2}\right)$ arithmetic operations ($\widetilde{O}$ hides polylogarithmic factors $(\ln n)^c$, $c>0$). For the second one, which is based on our novel Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD) algorithm, we prove the complexity bound $\widetilde{O}\left(\min\left\{\frac{n^{9/4}}{\varepsilon}, \frac{n^{2}}{\varepsilon^2} \right\}\right)$ arithmetic operations. Both bounds have better dependence on $\varepsilon$ than the state-of-the-art result given by $\widetilde{O}\left(\frac{n^2}{\varepsilon^3}\right)$. Our second algorithm not only has better dependence on $\varepsilon$ in the complexity bound, but also is not specific to entropic regularization and can solve the OT problem with different regularizers.

Wed 11 July 7:50 - 8:00 PDT

An Efficient Semismooth Newton based Algorithm for Convex Clustering

Yancheng Yuan · Defeng Sun · Kim-Chuan Toh

Clustering is a fundamental problem in unsupervisedlearning. Popular methods like K-means,may suffer from instability as they are prone toget stuck in its local minima. Recently, the sumof-norms(SON) model (also known as clusteringpath), which is a convex relaxation of hierarchicalclustering model, has been proposed in(Lindsten et al., 2011) and (Hocking et al., 2011).Although numerical algorithms like alternatingdirection method of multipliers (ADMM) and alternatingminimization algorithm (AMA) havebeen proposed to solve convex clustering model(Chi & Lange, 2015), it is known to be very challengingto solve large-scale problems. In this paper,we propose a semismooth Newton based augmentedLagrangian method for large-scale convexclustering problems. Extensive numerical experimentson both simulated and real data demonstratethat our algorithm is highly efficient androbust for solving large-scale problems. Moreover,the numerical results also show the superiorperformance and scalability of our algorithm comparingto existing first-order methods.

Main Navigation

Session

Optimization (Convex) 1

Dissipativity Theory for Accelerating Stochastic Variance Reduction: A Unified Analysis of SVRG and Katyusha Using Semidefinite Programs

Lyapunov Functions for First-Order Methods: Tight Automated Convergence Guarantees

ADMM and Accelerated ADMM as Continuous Dynamical Systems

Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm

An Efficient Semismooth Newton based Algorithm for Convex Clustering