Timezone: »
Many problems in machine learning involve bilevel optimization (BLO), including hyperparameter optimization, meta-learning, and dataset distillation. Bilevel problems involve inner and outer parameters, each optimized for its own objective. Often, at least one of the two levels is underspecified and there are multiple ways to choose among equivalent optima. Inspired by recent studies of the implicit bias induced by optimization algorithms in single-level optimization, we investigate the implicit bias of different gradient-based algorithms for jointly optimizing the inner and outer parameters. We delineate two standard BLO methods---cold-start and warm-start BLO---and show that the converged solution or long-run behavior depends to a large degree on these and other algorithmic choices, such as the hypergradient approximation. We also show that the solutions from warm-start BLO can encode a surprising amount of information about the outer objective, even when the outer optimization variables are low-dimensional. We believe that implicit bias deserves as central a role in the study of bilevel optimization as it has attained in the study of single-level neural net optimization.
Author Information
Paul Vicol (University of Toronto)
Jonathan Lorraine (University of Toronto)
Fabian Pedregosa (Google)
David Duvenaud (University of Toronto)
Roger Grosse (University of Toronto and Vector Institute)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: On Implicit Bias in Overparameterized Bilevel Optimization »
Tue. Jul 19th through Wed the 20th Room Hall E #615
More from the Same Authors
-
2023 : Statistics estimation in neural network training: a recursive identification approach »
Ruth Crasto · Xuchan Bao · Roger Grosse -
2023 : Calibrating Language Models via Augmented Prompt Ensembles »
Mingjian Jiang · Yangjun Ruan · Sicong Huang · Saifei Liao · Silviu Pitis · Roger Grosse · Jimmy Ba -
2023 Poster: Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single »
Paul Vicol -
2023 Poster: Efficient Parametric Approximations of Neural Network Function Space Distance »
Nikita Dhawan · Sicong Huang · Juhan Bae · Roger Grosse -
2023 Poster: Second-order regression models exhibit progressive sharpening to the edge of stability »
Atish Agarwala · Fabian Pedregosa · Jeffrey Pennington -
2022 Poster: Only tails matter: Average-Case Universality and Robustness in the Convex Regime »
LEONARDO CUNHA · Gauthier Gidel · Fabian Pedregosa · Damien Scieur · Courtney Paquette -
2022 Spotlight: Only tails matter: Average-Case Universality and Robustness in the Convex Regime »
LEONARDO CUNHA · Gauthier Gidel · Fabian Pedregosa · Damien Scieur · Courtney Paquette -
2021 : Implicit Regularization in Overparameterized Bilevel Optimization »
Paul Vicol -
2021 : David Duvenaud »
David Duvenaud -
2021 Poster: Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition »
Shengyang Sun · Jiaxin Shi · Andrew Wilson · Roger Grosse -
2021 Spotlight: Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition »
Shengyang Sun · Jiaxin Shi · Andrew Wilson · Roger Grosse -
2021 Poster: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2021 Poster: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies »
Paul Vicol · Luke Metz · Jascha Sohl-Dickstein -
2021 Oral: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies »
Paul Vicol · Luke Metz · Jascha Sohl-Dickstein -
2021 Spotlight: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2021 Poster: Oops I Took A Gradient: Scalable Sampling for Discrete Distributions »
Will Grathwohl · Kevin Swersky · Milad Hashemi · David Duvenaud · Chris Maddison -
2021 Poster: On Monotonic Linear Interpolation of Neural Network Parameters »
James Lucas · Juhan Bae · Michael Zhang · Stanislav Fort · Richard Zemel · Roger Grosse -
2021 Spotlight: On Monotonic Linear Interpolation of Neural Network Parameters »
James Lucas · Juhan Bae · Michael Zhang · Stanislav Fort · Richard Zemel · Roger Grosse -
2021 Oral: Oops I Took A Gradient: Scalable Sampling for Discrete Distributions »
Will Grathwohl · Kevin Swersky · Milad Hashemi · David Duvenaud · Chris Maddison -
2021 : Introduction »
Fabian Pedregosa · Courtney Paquette -
2021 Tutorial: Random Matrix Theory and ML (RMT+ML) »
Fabian Pedregosa · Courtney Paquette · Thomas Trogdon · Jeffrey Pennington -
2020 Poster: Acceleration through spectral density estimation »
Fabian Pedregosa · Damien Scieur -
2020 Poster: Universal Asymptotic Optimality of Polyak Momentum »
Damien Scieur · Fabian Pedregosa -
2020 Poster: Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization »
Geoffrey Negiar · Gideon Dresdner · Alicia Yi-Ting Tsai · Laurent El Ghaoui · Francesco Locatello · Robert Freund · Fabian Pedregosa -
2020 Poster: Evaluating Lossy Compression Rates of Deep Generative Models »
Sicong Huang · Alireza Makhzani · Yanshuai Cao · Roger Grosse -
2020 Poster: Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling »
Will Grathwohl · Kuan-Chieh Wang · Joern-Henrik Jacobsen · David Duvenaud · Richard Zemel -
2019 Poster: Sorting Out Lipschitz Function Approximation »
Cem Anil · James Lucas · Roger Grosse -
2019 Poster: Invertible Residual Networks »
Jens Behrmann · Will Grathwohl · Ricky T. Q. Chen · David Duvenaud · Joern-Henrik Jacobsen -
2019 Poster: EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis »
Chaoqi Wang · Roger Grosse · Sanja Fidler · Guodong Zhang -
2019 Oral: EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis »
Chaoqi Wang · Roger Grosse · Sanja Fidler · Guodong Zhang -
2019 Oral: Sorting Out Lipschitz Function Approximation »
Cem Anil · James Lucas · Roger Grosse -
2019 Oral: Invertible Residual Networks »
Jens Behrmann · Will Grathwohl · Ricky T. Q. Chen · David Duvenaud · Joern-Henrik Jacobsen -
2018 Poster: Noisy Natural Gradient as Variational Inference »
Guodong Zhang · Shengyang Sun · David Duvenaud · Roger Grosse -
2018 Poster: Distilling the Posterior in Bayesian Neural Networks »
Kuan-Chieh Wang · Paul Vicol · James Lucas · Li Gu · Roger Grosse · Richard Zemel -
2018 Oral: Noisy Natural Gradient as Variational Inference »
Guodong Zhang · Shengyang Sun · David Duvenaud · Roger Grosse -
2018 Oral: Distilling the Posterior in Bayesian Neural Networks »
Kuan-Chieh Wang · Paul Vicol · James Lucas · Li Gu · Roger Grosse · Richard Zemel -
2018 Poster: Differentiable Compositional Kernel Learning for Gaussian Processes »
Shengyang Sun · Guodong Zhang · Chaoqi Wang · Wenyuan Zeng · Jiaman Li · Roger Grosse -
2018 Poster: Inference Suboptimality in Variational Autoencoders »
Chris Cremer · Xuechen Li · David Duvenaud -
2018 Oral: Inference Suboptimality in Variational Autoencoders »
Chris Cremer · Xuechen Li · David Duvenaud -
2018 Oral: Differentiable Compositional Kernel Learning for Gaussian Processes »
Shengyang Sun · Guodong Zhang · Chaoqi Wang · Wenyuan Zeng · Jiaman Li · Roger Grosse