Timezone: »
Poster
Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums
Chaobing Song · Stephen Wright · Jelena Diakonikolas
Structured nonsmooth convex finite-sum optimization appears in many machine learning applications, including support vector machines and least absolute deviation. For the primal-dual formulation of this problem, we propose a novel algorithm called \emph{Variance Reduction via Primal-Dual Accelerated Dual Averaging (\vrpda)}. In the nonsmooth and general convex setting, \vrpda~has the overall complexity $O(nd\log\min \{1/\epsilon, n\} + d/\epsilon )$ in terms of the primal-dual gap, where $n$ denotes the number of samples, $d$ the dimension of the primal variables, and $\epsilon$ the desired accuracy. In the nonsmooth and strongly convex setting, the overall complexity of \vrpda~becomes $O(nd\log\min\{1/\epsilon, n\} + d/\sqrt{\epsilon})$ in terms of both the primal-dual gap and the distance between iterate and optimal solution. Both these results for \vrpda~improve significantly on state-of-the-art complexity estimates---which are $O(nd\log \min\{1/\epsilon, n\} + \sqrt{n}d/\epsilon)$ for the nonsmooth and general convex setting and $O(nd\log \min\{1/\epsilon, n\} + \sqrt{n}d/\sqrt{\epsilon})$ for the nonsmooth and strongly convex setting---with a simpler and more straightforward algorithm and analysis. Moreover, both complexities are better than \emph{lower} bounds for general convex finite-sum optimization, because our approach makes use of additional, commonly occurring structure. Numerical experiments reveal competitive performance of \vrpda~compared to state-of-the-art approaches.
Author Information
Chaobing Song (University of Wisconsin-Madison)
Stephen Wright (University of Wisconsin-Madison)
Jelena Diakonikolas (University of Wisconsin-Madison)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Oral: Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums »
Tue. Jul 20th 01:00 -- 01:20 PM Room
More from the Same Authors
-
2023 Oral: A Fully First-Order Method for Stochastic Bilevel Optimization »
Jeongyeol Kwon · Dohyun Kwon · Stephen Wright · Robert Nowak -
2023 Poster: Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization »
Xufeng Cai · Chaobing Song · Stephen Wright · Jelena Diakonikolas -
2023 Oral: Robustly Learning a Single Neuron via Sharpness »
Puqian Wang · Nikos Zarifis · Ilias Diakonikolas · Jelena Diakonikolas -
2023 Poster: Cut your Losses with Squentropy »
Like Hui · Misha Belkin · Stephen Wright -
2023 Poster: Robustly Learning a Single Neuron via Sharpness »
Puqian Wang · Nikos Zarifis · Ilias Diakonikolas · Jelena Diakonikolas -
2023 Poster: A Fully First-Order Method for Stochastic Bilevel Optimization »
Jeongyeol Kwon · Dohyun Kwon · Stephen Wright · Robert Nowak -
2023 Poster: Accelerated Cyclic Coordinate Dual Averaging with Extrapolation for Composite Convex Optimization »
Cheuk Yin Lin · Chaobing Song · Jelena Diakonikolas -
2021 Poster: Parameter-free Locally Accelerated Conditional Gradients »
Alejandro Carderera · Jelena Diakonikolas · Cheuk Yin Lin · Sebastian Pokutta -
2021 Spotlight: Parameter-free Locally Accelerated Conditional Gradients »
Alejandro Carderera · Jelena Diakonikolas · Cheuk Yin Lin · Sebastian Pokutta -
2019 Poster: First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems »
Ching-pei Lee · Stephen Wright -
2019 Poster: Bilinear Bandits with Low-rank Structure »
Kwang-Sung Jun · Rebecca Willett · Stephen Wright · Robert Nowak -
2019 Oral: First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems »
Ching-pei Lee · Stephen Wright -
2019 Oral: Bilinear Bandits with Low-rank Structure »
Kwang-Sung Jun · Rebecca Willett · Stephen Wright · Robert Nowak -
2019 Poster: Blended Conditonal Gradients »
Gábor Braun · Sebastian Pokutta · Dan Tu · Stephen Wright -
2019 Oral: Blended Conditonal Gradients »
Gábor Braun · Sebastian Pokutta · Dan Tu · Stephen Wright -
2018 Poster: Dissipativity Theory for Accelerating Stochastic Variance Reduction: A Unified Analysis of SVRG and Katyusha Using Semidefinite Programs »
Bin Hu · Stephen Wright · Laurent Lessard -
2018 Oral: Dissipativity Theory for Accelerating Stochastic Variance Reduction: A Unified Analysis of SVRG and Katyusha Using Semidefinite Programs »
Bin Hu · Stephen Wright · Laurent Lessard