Timezone: »
We study the problem of learning-to-learn: inferring a learning algorithm that works well on tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent on the true risk regularized by the square euclidean distance to a bias vector. We present an average excess risk bound for such a learning algorithm. This result quantifies the potential benefit of using a bias vector with respect to the unbiased case. We then address the problem of estimating the bias from a sequence of tasks. We propose a meta-algorithm which incrementally updates the bias, as new tasks are observed. The low space and time complexity of this approach makes it appealing in practice. We provide guarantees on the learning ability of the meta-algorithm. A key feature of our results is that, when the number of tasks grows and their variance is relatively small, our learning-to-learn approach has a significant advantage over learning each task in isolation by Stochastic Gradient Descent without a bias term. We report on numerical experiments which demonstrate the effectiveness of our approach.
Author Information
Giulia Denevi (IIT)
Carlo Ciliberto (Imperial College London)
Riccardo Grazzi (Istituto Italiano di Tecnologia - University College London)
Massimiliano Pontil (University College London)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Learning-to-Learn Stochastic Gradient Descent with Biased Regularization »
Thu. Jun 13th 01:30 -- 04:00 AM Room Pacific Ballroom #257
More from the Same Authors
-
2022 Poster: Measuring dissimilarity with diffeomorphism invariance »
Théophile Cantelobre · Carlo Ciliberto · Benjamin Guedj · Alessandro Rudi -
2022 Poster: Distribution Regression with Sliced Wasserstein Kernels »
Dimitri Marie Meunier · Massimiliano Pontil · Carlo Ciliberto -
2022 Spotlight: Distribution Regression with Sliced Wasserstein Kernels »
Dimitri Marie Meunier · Massimiliano Pontil · Carlo Ciliberto -
2022 Spotlight: Measuring dissimilarity with diffeomorphism invariance »
Théophile Cantelobre · Carlo Ciliberto · Benjamin Guedj · Alessandro Rudi -
2020 Poster: On the Iteration Complexity of Hypergradient Computation »
Riccardo Grazzi · Luca Franceschi · Massimiliano Pontil · Saverio Salzo -
2019 Poster: Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction »
Giulia Luise · Dimitrios Stamos · Massimiliano Pontil · Carlo Ciliberto -
2019 Poster: Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation »
Ruohan Wang · Carlo Ciliberto · Pierluigi Vito Amadori · Yiannis Demiris -
2019 Oral: Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction »
Giulia Luise · Dimitrios Stamos · Massimiliano Pontil · Carlo Ciliberto -
2019 Oral: Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation »
Ruohan Wang · Carlo Ciliberto · Pierluigi Vito Amadori · Yiannis Demiris -
2018 Poster: Bilevel Programming for Hyperparameter Optimization and Meta-Learning »
Luca Franceschi · Paolo Frasconi · Saverio Salzo · Riccardo Grazzi · Massimiliano Pontil -
2018 Oral: Bilevel Programming for Hyperparameter Optimization and Meta-Learning »
Luca Franceschi · Paolo Frasconi · Saverio Salzo · Riccardo Grazzi · Massimiliano Pontil -
2017 Poster: Forward and Reverse Gradient-Based Hyperparameter Optimization »
Luca Franceschi · Michele Donini · Paolo Frasconi · Massimiliano Pontil -
2017 Talk: Forward and Reverse Gradient-Based Hyperparameter Optimization »
Luca Franceschi · Michele Donini · Paolo Frasconi · Massimiliano Pontil