Timezone: »
Poster
A Fully First-Order Method for Stochastic Bilevel Optimization
Jeongyeol Kwon · Dohyun Kwon · Stephen Wright · Robert Nowak
We consider stochastic unconstrained bilevel optimization problems when only the first-order gradient oracles are available. While numerous optimization methods have been proposed for tackling bilevel problems, existing methods either tend to require possibly expensive calculations regarding Hessians of lower-level objectives, or lack rigorous finite-time performance guarantees. In this work, we propose a Fully First-order Stochastic Approximation (F2SA) method, and study its non-asymptotic convergence properties. Specifically, we show that F2SA converges to an $\epsilon$-stationary solution of the bilevel problem after $\epsilon^{-7/2}, \epsilon^{-5/2}$, and $\epsilon^{-3/2}$ iterations (each iteration using $O(1)$ samples) when stochastic noises are in both level objectives, only in the upper-level objective, and not present (deterministic settings), respectively. We further show that if we employ momentum-assisted gradient estimators, the iteration complexities can be improved to $\epsilon^{-5/2}, \epsilon^{-4/2}$, and $\epsilon^{-3/2}$, respectively. We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
Author Information
Jeongyeol Kwon (University of Wisconsin-Madison)
I am currently a PostDoc at University of Wisconsin-Madison, working with Prof. Robert Nowak. Prior to joining UW-Madison, I received my Ph.D. in ECE department at UT Austin, where I had wonderful years of learning and working with my advisor Prof. Constantine Caramanis. I got my Bachelor’s Degree in Electrical and Computer Engineering from Seoul National University (SNU) in 2016.
Dohyun Kwon (University of Seoul)
Stephen Wright (University of Wisconsin-Madison)
Robert Nowak (University of Wisconsion-Madison)

Robert Nowak holds the Nosbusch Professorship in Engineering at the University of Wisconsin-Madison, where his research focuses on signal processing, machine learning, optimization, and statistics.
Related Events (a corresponding poster, oral, or spotlight)
-
2023 Oral: A Fully First-Order Method for Stochastic Bilevel Optimization »
Fri. Jul 28th 01:48 -- 01:56 AM Room Ballroom B
More from the Same Authors
-
2021 : On the Sparsity of Deep Neural Networks in the Overparameterized Regime: An Empirical Study »
Rahul Parhi · Jack Wolf · Robert Nowak -
2023 : Algorithm Selection for Deep Active Learning with Imbalanced Datasets »
Jifan Zhang · Shuai Shao · Saurabh Verma · Robert Nowak -
2023 : LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning »
Jifan Zhang · Yifang Chen · Gregory Canal · Stephen Mussmann · Yinglun Zhu · Simon Du · Kevin Jamieson · Robert Nowak -
2023 : Looped Transformers are Better at Learning Learning Algorithms »
Liu Yang · Kangwook Lee · Robert Nowak · Dimitris Papailiopoulos -
2023 : SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits »
Subhojyoti Mukherjee · Qiaomin Xie · Josiah Hanna · Robert Nowak -
2023 : A Representer Theorem for Vector-Valued Neural Networks: Insights on Weight Decay Training and Widths of Deep Neural Networks »
Joseph Shenouda · Rahul Parhi · Kangwook Lee · Robert Nowak -
2023 Poster: Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization »
Xufeng Cai · Chaobing Song · Stephen Wright · Jelena Diakonikolas -
2023 Poster: Cut your Losses with Squentropy »
Like Hui · Misha Belkin · Stephen Wright -
2023 Poster: Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection »
Haoyue Bai · Gregory Canal · Xuefeng Du · Jeongyeol Kwon · Robert Nowak · Sharon Li -
2023 Poster: Reward-Mixing MDPs with Few Latent Contexts are Learnable »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2023 Poster: Complexity of Block Coordinate Descent with Proximal Regularization and Applications to Wasserstein CP-dictionary Learning »
Dohyun Kwon · Hanbaek Lyu -
2022 Poster: GALAXY: Graph-based Active Learning at the Extreme »
Jifan Zhang · Julian Katz-Samuels · Robert Nowak -
2022 Spotlight: GALAXY: Graph-based Active Learning at the Extreme »
Jifan Zhang · Julian Katz-Samuels · Robert Nowak -
2022 Poster: Training OOD Detectors in their Natural Habitats »
Julian Katz-Samuels · Julia Nakhleh · Robert Nowak · Sharon Li -
2022 Spotlight: Training OOD Detectors in their Natural Habitats »
Julian Katz-Samuels · Julia Nakhleh · Robert Nowak · Sharon Li -
2022 Poster: Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2022 Spotlight: Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2021 Poster: Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums »
Chaobing Song · Stephen Wright · Jelena Diakonikolas -
2021 Oral: Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums »
Chaobing Song · Stephen Wright · Jelena Diakonikolas -
2020 Poster: Robust Outlier Arm Identification »
Yinglun Zhu · Sumeet Katariya · Robert Nowak -
2019 Poster: First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems »
Ching-pei Lee · Stephen Wright -
2019 Poster: Bilinear Bandits with Low-rank Structure »
Kwang-Sung Jun · Rebecca Willett · Stephen Wright · Robert Nowak -
2019 Oral: First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems »
Ching-pei Lee · Stephen Wright -
2019 Oral: Bilinear Bandits with Low-rank Structure »
Kwang-Sung Jun · Rebecca Willett · Stephen Wright · Robert Nowak -
2019 Poster: Blended Conditonal Gradients »
Gábor Braun · Sebastian Pokutta · Dan Tu · Stephen Wright -
2019 Oral: Blended Conditonal Gradients »
Gábor Braun · Sebastian Pokutta · Dan Tu · Stephen Wright -
2019 Tutorial: Active Learning: From Theory to Practice »
Robert Nowak · Steve Hanneke -
2018 Poster: Dissipativity Theory for Accelerating Stochastic Variance Reduction: A Unified Analysis of SVRG and Katyusha Using Semidefinite Programs »
Bin Hu · Stephen Wright · Laurent Lessard -
2018 Oral: Dissipativity Theory for Accelerating Stochastic Variance Reduction: A Unified Analysis of SVRG and Katyusha Using Semidefinite Programs »
Bin Hu · Stephen Wright · Laurent Lessard -
2017 Poster: Algebraic Variety Models for High-Rank Matrix Completion »
Greg Ongie · Laura Balzano · Rebecca Willett · Robert Nowak -
2017 Talk: Algebraic Variety Models for High-Rank Matrix Completion »
Greg Ongie · Laura Balzano · Rebecca Willett · Robert Nowak