Timezone: »
Poster
Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards
Aadirupa Saha · Pierre Gaillard · Michal Valko
Thu Jul 16 06:00 AM -- 06:45 AM & Thu Jul 16 07:00 PM -- 07:45 PM (PDT) @ None #None
In this paper, we consider the problem of sleeping bandits with stochastic action sets and adversarial rewards. In this setting, in contrast to most work in bandits, the actions may not be available at all times. For instance, some products might be out of stock in item recommendation. The best existing efficient (i.e., polynomial-time) algorithms for this problem only guarantee a $O(T^{2/3})$ upper-bound on the regret. Yet, inefficient algorithms based on EXP4 can achieve $O(\sqrt{T})$. In this paper, we provide a new computationally efficient algorithm inspired by EXP3 satisfying a regret of order $O(\sqrt{T})$ when the availabilities of each action $i \in \cA$ are independent. We then study the most general version of the problem where at each round available sets are generated from some unknown arbitrary distribution (i.e., without the independence assumption) and propose an efficient algorithm with $O(\sqrt {2^K T})$ regret guarantee. Our theoretical results are corroborated with experimental evaluations.
Author Information
Aadirupa Saha (Indian Institute of Science (IISc), Bangalore)
Pierre Gaillard (INRIA)
Michal Valko (DeepMind)
More from the Same Authors
-
2021 : Marginalized Operators for Off-Policy Reinforcement Learning »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 : Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Jean Tarbouriech · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2022 Poster: From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses »
Daniil Tiapkin · Denis Belomestny · Eric Moulines · Alexey Naumov · Sergey Samsonov · Yunhao Tang · Michal Valko · Pierre MENARD -
2022 Oral: From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses »
Daniil Tiapkin · Denis Belomestny · Eric Moulines · Alexey Naumov · Sergey Samsonov · Yunhao Tang · Michal Valko · Pierre MENARD -
2022 Poster: Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2022 Spotlight: Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2022 Poster: Retrieval-Augmented Reinforcement Learning »
Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · Adrià Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell -
2022 Spotlight: Retrieval-Augmented Reinforcement Learning »
Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · Adrià Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell -
2022 Poster: Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models »
Viktor Bengs · Aadirupa Saha · Eyke Hüllermeier -
2022 Spotlight: Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models »
Viktor Bengs · Aadirupa Saha · Eyke Hüllermeier -
2022 Poster: Non-Stationary Dueling Bandits »
Aadirupa Saha · Shubham Gupta -
2022 Spotlight: Non-Stationary Dueling Bandits »
Aadirupa Saha · Shubham Gupta -
2022 Poster: Versatile Dueling Bandits: Best-of-both World Analyses for Learning from Relative Preferences »
Aadirupa Saha · Pierre Gaillard -
2022 Spotlight: Versatile Dueling Bandits: Best-of-both World Analyses for Learning from Relative Preferences »
Aadirupa Saha · Pierre Gaillard -
2022 Workshop: Complex feedback in online learning »
Rémy Degenne · Pierre Gaillard · Wouter Koolen · Aadirupa Saha -
2021 Poster: Fast active learning for pure exploration in reinforcement learning »
Pierre MENARD · Omar Darwiche Domingues · Anders Jonsson · Emilie Kaufmann · Edouard Leurent · Michal Valko -
2021 Poster: UCB Momentum Q-learning: Correcting the bias without forgetting »
Pierre MENARD · Omar Darwiche Domingues · Xuedong Shang · Michal Valko -
2021 Poster: Confidence-Budget Matching for Sequential Budgeted Learning »
Yonathan Efroni · Nadav Merlis · Aadirupa Saha · Shie Mannor -
2021 Poster: Adversarial Dueling Bandits »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Spotlight: Fast active learning for pure exploration in reinforcement learning »
Pierre MENARD · Omar Darwiche Domingues · Anders Jonsson · Emilie Kaufmann · Edouard Leurent · Michal Valko -
2021 Spotlight: Confidence-Budget Matching for Sequential Budgeted Learning »
Yonathan Efroni · Nadav Merlis · Aadirupa Saha · Shie Mannor -
2021 Spotlight: Adversarial Dueling Bandits »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Oral: UCB Momentum Q-learning: Correcting the bias without forgetting »
Pierre MENARD · Omar Darwiche Domingues · Xuedong Shang · Michal Valko -
2021 Poster: Kernel-Based Reinforcement Learning: A Finite-Time Analysis »
Omar Darwiche Domingues · Pierre Menard · Matteo Pirotta · Emilie Kaufmann · Michal Valko -
2021 Poster: Online A-Optimal Design and Active Linear Regression »
Xavier Fontaine · Pierre Perrault · Michal Valko · Vianney Perchet -
2021 Spotlight: Kernel-Based Reinforcement Learning: A Finite-Time Analysis »
Omar Darwiche Domingues · Pierre Menard · Matteo Pirotta · Emilie Kaufmann · Michal Valko -
2021 Spotlight: Online A-Optimal Design and Active Linear Regression »
Xavier Fontaine · Pierre Perrault · Michal Valko · Vianney Perchet -
2021 Poster: Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning »
Tadashi Kozuno · Yunhao Tang · Mark Rowland · Remi Munos · Steven Kapturowski · Will Dabney · Michal Valko · David Abel -
2021 Poster: Taylor Expansion of Discount Factors »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 Poster: Optimal regret algorithm for Pseudo-1d Bandit Convex Optimization »
Aadirupa Saha · Nagarajan Natarajan · Praneeth Netrapalli · Prateek Jain -
2021 Spotlight: Taylor Expansion of Discount Factors »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 Spotlight: Optimal regret algorithm for Pseudo-1d Bandit Convex Optimization »
Aadirupa Saha · Nagarajan Natarajan · Praneeth Netrapalli · Prateek Jain -
2021 Spotlight: Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning »
Tadashi Kozuno · Yunhao Tang · Mark Rowland · Remi Munos · Steven Kapturowski · Will Dabney · Michal Valko · David Abel -
2021 Poster: Dueling Convex Optimization »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Spotlight: Dueling Convex Optimization »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2020 Poster: Monte-Carlo Tree Search as Regularized Policy Optimization »
Jean-Bastien Grill · Florent Altché · Yunhao Tang · Thomas Hubert · Michal Valko · Ioannis Antonoglou · Remi Munos -
2020 Poster: From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model »
Aadirupa Saha · Aditya Gopalan -
2020 Poster: Gamification of Pure Exploration for Linear Bandits »
Rémy Degenne · Pierre Menard · Xuedong Shang · Michal Valko -
2020 Poster: Stochastic bandits with arm-dependent delays »
Anne Gael Manegueu · Claire Vernade · Alexandra Carpentier · Michal Valko -
2020 Poster: Budgeted Online Influence Maximization »
Pierre Perrault · Jennifer Healey · Zheng Wen · Michal Valko -
2020 Poster: Near-linear time Gaussian process optimization with adaptive batching and resparsification »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2020 Poster: Taylor Expansion Policy Optimization »
Yunhao Tang · Michal Valko · Remi Munos