Timezone: »
We study the problem of learning-to-learn: infer- ring a learning algorithm that works well on a family of tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent (SGD) on the true risk regularized by the square euclidean distance from a bias vector. We present an average excess risk bound for such a learning algorithm that quantifies the potential benefit of using a bias vector with respect to the unbiased case. We then propose a novel meta-algorithm to estimate the bias term online from a sequence of observed tasks. The small memory footprint and low time complexity of our approach makes it appealing in practice while our theoretical analysis provides guarantees on the generalization properties of the meta-algorithm on new tasks. A key feature of our results is that, when the number of tasks grows and their vari- ance is relatively small, our learning-to-learn approach has a significant advantage over learning each task in isolation by standard SGD without a bias term. Numerical experiments demonstrate the effectiveness of our approach in practice.
Author Information
Giulia Denevi (IIT)
Carlo Ciliberto (Imperial College London)
Riccardo Grazzi (Istituto Italiano di Tecnologia - University College London)
Massimiliano Pontil (University College London)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Learning-to-Learn Stochastic Gradient Descent with Biased Regularization »
Wed. Jun 12th 10:05 -- 10:10 PM Room Room 201
More from the Same Authors
-
2022 Poster: Measuring dissimilarity with diffeomorphism invariance »
Théophile Cantelobre · Carlo Ciliberto · Benjamin Guedj · Alessandro Rudi -
2022 Poster: Distribution Regression with Sliced Wasserstein Kernels »
Dimitri Marie Meunier · Massimiliano Pontil · Carlo Ciliberto -
2022 Spotlight: Distribution Regression with Sliced Wasserstein Kernels »
Dimitri Marie Meunier · Massimiliano Pontil · Carlo Ciliberto -
2022 Spotlight: Measuring dissimilarity with diffeomorphism invariance »
Théophile Cantelobre · Carlo Ciliberto · Benjamin Guedj · Alessandro Rudi -
2020 Poster: On the Iteration Complexity of Hypergradient Computation »
Riccardo Grazzi · Luca Franceschi · Massimiliano Pontil · Saverio Salzo -
2019 Poster: Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction »
Giulia Luise · Dimitrios Stamos · Massimiliano Pontil · Carlo Ciliberto -
2019 Poster: Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation »
Ruohan Wang · Carlo Ciliberto · Pierluigi Vito Amadori · Yiannis Demiris -
2019 Oral: Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction »
Giulia Luise · Dimitrios Stamos · Massimiliano Pontil · Carlo Ciliberto -
2019 Oral: Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation »
Ruohan Wang · Carlo Ciliberto · Pierluigi Vito Amadori · Yiannis Demiris -
2018 Poster: Bilevel Programming for Hyperparameter Optimization and Meta-Learning »
Luca Franceschi · Paolo Frasconi · Saverio Salzo · Riccardo Grazzi · Massimiliano Pontil -
2018 Oral: Bilevel Programming for Hyperparameter Optimization and Meta-Learning »
Luca Franceschi · Paolo Frasconi · Saverio Salzo · Riccardo Grazzi · Massimiliano Pontil -
2017 Poster: Forward and Reverse Gradient-Based Hyperparameter Optimization »
Luca Franceschi · Michele Donini · Paolo Frasconi · Massimiliano Pontil -
2017 Talk: Forward and Reverse Gradient-Based Hyperparameter Optimization »
Luca Franceschi · Michele Donini · Paolo Frasconi · Massimiliano Pontil