Timezone: »
We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
Author Information
Leonardo Cella (University of Milan)
Alessandro Lazaric (Facebook AI Research)
Massimiliano Pontil (Istituto Italiano di Tecnologia and University College London)
-
More from the Same Authors
-
2021 : Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection »
Matteo Papini · Andrea Tirinzoni · Aldo Pacchiano · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 : A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs »
Andrea Tirinzoni · Matteo Pirotta · Alessandro Lazaric -
2021 : Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Jean Tarbouriech · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 : A general sample complexity analysis of vanilla policy gradient »
Rui Yuan · Robert Gower · Alessandro Lazaric -
2021 : Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching »
Pierre-Alexandre Kamienny · Jean Tarbouriech · Alessandro Lazaric · Ludovic Denoyer -
2021 : Exploration-Driven Representation Learning in Reinforcement Learning »
Akram Erraqabi · Mingde Zhao · Marlos C. Machado · Yoshua Bengio · Sainbayar Sukhbaatar · Ludovic Denoyer · Alessandro Lazaric -
2023 Poster: Layered State Discovery for Incremental Autonomous Exploration »
Liyu Chen · Andrea Tirinzoni · Alessandro Lazaric · Matteo Pirotta -
2022 Workshop: Responsible Decision Making in Dynamic Environments »
Virginie Do · Thorsten Joachims · Alessandro Lazaric · Joelle Pineau · Matteo Pirotta · Harsh Satija · Nicolas Usunier -
2022 Poster: Distribution Regression with Sliced Wasserstein Kernels »
Dimitri Marie Meunier · Massimiliano Pontil · Carlo Ciliberto -
2022 Spotlight: Distribution Regression with Sliced Wasserstein Kernels »
Dimitri Marie Meunier · Massimiliano Pontil · Carlo Ciliberto -
2022 Poster: Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2022 Spotlight: Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2021 : Invited Talk by Alessandro Lazaric »
Alessandro Lazaric -
2021 Poster: Leveraging Good Representations in Linear Contextual Bandits »
Matteo Papini · Andrea Tirinzoni · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 Spotlight: Leveraging Good Representations in Linear Contextual Bandits »
Matteo Papini · Andrea Tirinzoni · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 Poster: Best Model Identification: A Rested Bandit Formulation »
Leonardo Cella · Massimiliano Pontil · Claudio Gentile -
2021 Spotlight: Best Model Identification: A Rested Bandit Formulation »
Leonardo Cella · Massimiliano Pontil · Claudio Gentile -
2021 Poster: Reinforcement Learning with Prototypical Representations »
Denis Yarats · Rob Fergus · Alessandro Lazaric · Lerrel Pinto -
2021 Spotlight: Reinforcement Learning with Prototypical Representations »
Denis Yarats · Rob Fergus · Alessandro Lazaric · Lerrel Pinto -
2020 Poster: No-Regret Exploration in Goal-Oriented Reinforcement Learning »
Jean Tarbouriech · Evrard Garcelon · Michal Valko · Matteo Pirotta · Alessandro Lazaric -
2020 Poster: Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation »
Marc Abeille · Alessandro Lazaric -
2020 Poster: Learning Near Optimal Policies with Low Inherent Bellman Error »
Andrea Zanette · Alessandro Lazaric · Mykel Kochenderfer · Emma Brunskill -
2020 Poster: On the Iteration Complexity of Hypergradient Computation »
Riccardo Grazzi · Luca Franceschi · Massimiliano Pontil · Saverio Salzo -
2020 Poster: Near-linear time Gaussian process optimization with adaptive batching and resparsification »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2019 Poster: Learning Discrete Structures for Graph Neural Networks »
Luca Franceschi · Mathias Niepert · Massimiliano Pontil · Xiao He -
2019 Oral: Learning Discrete Structures for Graph Neural Networks »
Luca Franceschi · Mathias Niepert · Massimiliano Pontil · Xiao He -
2019 Poster: Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction »
Giulia Luise · Dimitrios Stamos · Massimiliano Pontil · Carlo Ciliberto -
2019 Oral: Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction »
Giulia Luise · Dimitrios Stamos · Massimiliano Pontil · Carlo Ciliberto -
2018 Poster: Improved large-scale graph learning through ridge spectral sparsification »
Daniele Calandriello · Alessandro Lazaric · Ioannis Koutis · Michal Valko -
2018 Oral: Improved large-scale graph learning through ridge spectral sparsification »
Daniele Calandriello · Alessandro Lazaric · Ioannis Koutis · Michal Valko -
2018 Poster: Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Ronald Ortner -
2018 Poster: Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems »
Marc Abeille · Alessandro Lazaric -
2018 Oral: Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems »
Marc Abeille · Alessandro Lazaric -
2018 Oral: Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Ronald Ortner