Timezone: »

Meta-learning with Stochastic Linear Bandits
Leonardo Cella · Alessandro Lazaric · Massimiliano Pontil

Tue Jul 14 01:00 PM -- 01:45 PM & Wed Jul 15 12:00 AM -- 12:45 AM (PDT) @

We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.

Author Information

Leonardo Cella (University of Milan)
Alessandro Lazaric (Facebook AI Research)
Massimiliano Pontil (Istituto Italiano di Tecnologia and University College London)


More from the Same Authors