Timezone: »
We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly, which has raised the interest in approximation methods. We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. Under the hypothesis that the fixed point equation is defined by a contraction mapping, we present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings.
Author Information
Riccardo Grazzi (Istituto Italiano di Tecnologia - University College London)
Luca Franceschi (Istituto Italiano di Tecnologia - University College London)
Massimiliano Pontil (Istituto Italiano di Tecnologia and University College London)
-
Saverio Salzo (Istituto Italiano di Tecnologia)
More from the Same Authors
-
2022 Poster: Bregman Neural Networks »
Jordan Frecon · Gilles Gasso · Massimiliano Pontil · Saverio Salzo -
2022 Poster: Distribution Regression with Sliced Wasserstein Kernels »
Dimitri Marie Meunier · Massimiliano Pontil · Carlo Ciliberto -
2022 Spotlight: Bregman Neural Networks »
Jordan Frecon · Gilles Gasso · Massimiliano Pontil · Saverio Salzo -
2022 Spotlight: Distribution Regression with Sliced Wasserstein Kernels »
Dimitri Marie Meunier · Massimiliano Pontil · Carlo Ciliberto -
2022 Poster: Batch Greenkhorn Algorithm for Entropic-Regularized Multimarginal Optimal Transport: Linear Rate of Convergence and Iteration Complexity »
Vladimir Kostic · Saverio Salzo · Massimiliano Pontil -
2022 Spotlight: Batch Greenkhorn Algorithm for Entropic-Regularized Multimarginal Optimal Transport: Linear Rate of Convergence and Iteration Complexity »
Vladimir Kostic · Saverio Salzo · Massimiliano Pontil -
2021 Poster: Best Model Identification: A Rested Bandit Formulation »
Leonardo Cella · Massimiliano Pontil · Claudio Gentile -
2021 Spotlight: Best Model Identification: A Rested Bandit Formulation »
Leonardo Cella · Massimiliano Pontil · Claudio Gentile -
2020 Poster: Meta-learning with Stochastic Linear Bandits »
Leonardo Cella · Alessandro Lazaric · Massimiliano Pontil -
2019 Poster: Learning-to-Learn Stochastic Gradient Descent with Biased Regularization »
Giulia Denevi · Carlo Ciliberto · Riccardo Grazzi · Massimiliano Pontil -
2019 Poster: Learning Discrete Structures for Graph Neural Networks »
Luca Franceschi · Mathias Niepert · Massimiliano Pontil · Xiao He -
2019 Oral: Learning Discrete Structures for Graph Neural Networks »
Luca Franceschi · Mathias Niepert · Massimiliano Pontil · Xiao He -
2019 Oral: Learning-to-Learn Stochastic Gradient Descent with Biased Regularization »
Giulia Denevi · Carlo Ciliberto · Riccardo Grazzi · Massimiliano Pontil -
2019 Poster: Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction »
Giulia Luise · Dimitrios Stamos · Massimiliano Pontil · Carlo Ciliberto -
2019 Oral: Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction »
Giulia Luise · Dimitrios Stamos · Massimiliano Pontil · Carlo Ciliberto -
2018 Poster: Bilevel Programming for Hyperparameter Optimization and Meta-Learning »
Luca Franceschi · Paolo Frasconi · Saverio Salzo · Riccardo Grazzi · Massimiliano Pontil -
2018 Oral: Bilevel Programming for Hyperparameter Optimization and Meta-Learning »
Luca Franceschi · Paolo Frasconi · Saverio Salzo · Riccardo Grazzi · Massimiliano Pontil -
2017 Poster: Forward and Reverse Gradient-Based Hyperparameter Optimization »
Luca Franceschi · Michele Donini · Paolo Frasconi · Massimiliano Pontil -
2017 Talk: Forward and Reverse Gradient-Based Hyperparameter Optimization »
Luca Franceschi · Michele Donini · Paolo Frasconi · Massimiliano Pontil