Timezone: »
Counterfactual Risk Minimization (CRM) is a framework for dealing with the logged bandit feedback problem, where the goal is to improve a logging policy using offline data. In this paper, we explore the case where it is possible to deploy learned policies multiple times and acquire new data. We extend the CRM principle and its theory to this scenario, which we call "Sequential Counterfactual Risk Minimization (SCRM)." We introduce a novel counterfactual estimator and identify conditions that can improve the performance of CRM in terms of excess risk and regret rates, by using an analysis similar to restart strategies in accelerated optimization methods. We also provide an empirical evaluation of our method in both discrete and continuous action settings, and demonstrate the benefits of multiple deployments of CRM.
Author Information
Houssam Zenati (Criteo, INRIA)
Eustache Diemert (Criteo AI Lab)
Matthieu Martin (Swiss Federal Institute of Technology Lausanne)
Julien Mairal (Inria)
Pierre Gaillard (INRIA)
More from the Same Authors
-
2022 Workshop: Continuous Time Perspectives in Machine Learning »
Mihaela Rosca · Chongli Qin · Julien Mairal · Marc Deisenroth -
2022 Workshop: Complex feedback in online learning »
Rémy Degenne · Pierre Gaillard · Wouter Koolen · Aadirupa Saha -
2022 Poster: Nested Bandits »
Matthieu Martin · Panayotis Mertikopoulos · Thibaud J Rahier · Houssam Zenati -
2022 Poster: Versatile Dueling Bandits: Best-of-both World Analyses for Learning from Relative Preferences »
Aadirupa Saha · Pierre Gaillard -
2022 Spotlight: Versatile Dueling Bandits: Best-of-both World Analyses for Learning from Relative Preferences »
Aadirupa Saha · Pierre Gaillard -
2022 Spotlight: Nested Bandits »
Matthieu Martin · Panayotis Mertikopoulos · Thibaud J Rahier · Houssam Zenati -
2020 Poster: Convolutional Kernel Networks for Graph-Structured Data »
Dexiong Chen · Laurent Jacob · Julien Mairal -
2020 Poster: Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards »
Aadirupa Saha · Pierre Gaillard · Michal Valko -
2019 Poster: Estimate Sequences for Variance-Reduced Stochastic Composite Optimization »
Andrei Kulunchakov · Julien Mairal -
2019 Oral: Estimate Sequences for Variance-Reduced Stochastic Composite Optimization »
Andrei Kulunchakov · Julien Mairal -
2019 Invited Talk: Online Dictionary Learning for Sparse Coding »
Julien Mairal · Francis Bach · Jean Ponce · Guillermo Sapiro -
2019 Poster: A Kernel Perspective for Regularizing Deep Neural Networks »
Alberto Bietti · Gregoire Mialon · Dexiong Chen · Julien Mairal -
2019 Oral: A Kernel Perspective for Regularizing Deep Neural Networks »
Alberto Bietti · Gregoire Mialon · Dexiong Chen · Julien Mairal