Timezone: »
Spotlight
Fast active learning for pure exploration in reinforcement learning
Pierre MENARD · Omar Darwiche Domingues · Anders Jonsson · Emilie Kaufmann · Edouard Leurent · Michal Valko
Realistic environments often provide agents with very limited feedback.
When the environment is initially unknown, the feedback, in the beginning, can be completely absent,
and the agents may first choose to devote all their effort on \emph{exploring efficiently.}
The exploration remains a challenge while it has been addressed with many hand-tuned heuristics with different levels
of generality on one side, and a few theoretically-backed exploration strategies on the other.
Many of them are incarnated by \emph{intrinsic motivation} and in particular \emph{explorations bonuses}.
A common choice is to use $1/\sqrt{n}$ bonus,
where $n$ is a number of times this particular state-action pair was visited.
We show that, surprisingly, for a pure-exploration objective of \emph{reward-free exploration}, bonuses that scale with $1/n$ bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon $H$.
Furthermore, we show that with an improved analysis of the stopping time, we can improve by a factor $H$ the sample complexity
in the \emph{best-policy identification} setting, which is another pure-exploration objective, where the environment provides rewards but the agent is not penalized for its behavior during the
exploration phase.
Author Information
Pierre MENARD (OvGU)
Omar Darwiche Domingues (Inria)
Anders Jonsson (Universitat Pompeu Fabra)
Emilie Kaufmann (CNRS, Univ. Lille, Inria)
Edouard Leurent
Michal Valko (DeepMind / Inria / ENS Paris-Saclay)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Fast active learning for pure exploration in reinforcement learning »
Thu. Jul 22nd 04:00 -- 06:00 AM Room None
More from the Same Authors
-
2021 : Collision Resolution in Multi-player Bandits Without Observing Collision Information »
Eleni Nisioti · Nikolaos Thomos · Boris Bellalta · Anders Jonsson -
2021 : Marginalized Operators for Off-Policy Reinforcement Learning »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 : Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Jean Tarbouriech · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 : Density-Based Bonuses on Learned Representations for Reward-Free Exploration in Deep Reinforcement Learning »
Omar Darwiche Domingues · Corentin Tallec · Remi Munos · Michal Valko -
2022 Poster: From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses »
Daniil Tiapkin · Denis Belomestny · Eric Moulines · Alexey Naumov · Sergey Samsonov · Yunhao Tang · Michal Valko · Pierre MENARD -
2022 Oral: From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses »
Daniil Tiapkin · Denis Belomestny · Eric Moulines · Alexey Naumov · Sergey Samsonov · Yunhao Tang · Michal Valko · Pierre MENARD -
2022 Poster: Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2022 Spotlight: Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2022 Poster: Retrieval-Augmented Reinforcement Learning »
Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · Adrià Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell -
2022 Spotlight: Retrieval-Augmented Reinforcement Learning »
Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · Adrià Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell -
2022 : Emilie Kaufmann »
Emilie Kaufmann -
2021 : Invited Speaker: Emilie Kaufmann: On pure-exploration in Markov Decision Processes »
Emilie Kaufmann -
2021 Poster: Optimal Thompson Sampling strategies for support-aware CVaR bandits »
Dorian Baudry · Romain Gautron · Emilie Kaufmann · Odalric-Ambrym Maillard -
2021 Poster: Problem Dependent View on Structured Thresholding Bandit Problems »
James Cheshire · Pierre MENARD · Alexandra Carpentier -
2021 Spotlight: Problem Dependent View on Structured Thresholding Bandit Problems »
James Cheshire · Pierre MENARD · Alexandra Carpentier -
2021 Spotlight: Optimal Thompson Sampling strategies for support-aware CVaR bandits »
Dorian Baudry · Romain Gautron · Emilie Kaufmann · Odalric-Ambrym Maillard -
2021 Poster: UCB Momentum Q-learning: Correcting the bias without forgetting »
Pierre MENARD · Omar Darwiche Domingues · Xuedong Shang · Michal Valko -
2021 Oral: UCB Momentum Q-learning: Correcting the bias without forgetting »
Pierre MENARD · Omar Darwiche Domingues · Xuedong Shang · Michal Valko -
2021 Poster: Kernel-Based Reinforcement Learning: A Finite-Time Analysis »
Omar Darwiche Domingues · Pierre Menard · Matteo Pirotta · Emilie Kaufmann · Michal Valko -
2021 Poster: Online A-Optimal Design and Active Linear Regression »
Xavier Fontaine · Pierre Perrault · Michal Valko · Vianney Perchet -
2021 Spotlight: Kernel-Based Reinforcement Learning: A Finite-Time Analysis »
Omar Darwiche Domingues · Pierre Menard · Matteo Pirotta · Emilie Kaufmann · Michal Valko -
2021 Spotlight: Online A-Optimal Design and Active Linear Regression »
Xavier Fontaine · Pierre Perrault · Michal Valko · Vianney Perchet -
2021 Poster: Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning »
Tadashi Kozuno · Yunhao Tang · Mark Rowland · Remi Munos · Steven Kapturowski · Will Dabney · Michal Valko · David Abel -
2021 Poster: Taylor Expansion of Discount Factors »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 Spotlight: Taylor Expansion of Discount Factors »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 Spotlight: Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning »
Tadashi Kozuno · Yunhao Tang · Mark Rowland · Remi Munos · Steven Kapturowski · Will Dabney · Michal Valko · David Abel -
2020 : Short Talk 3 - A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces »
Omar Darwiche Domingues -
2020 Poster: Monte-Carlo Tree Search as Regularized Policy Optimization »
Jean-Bastien Grill · Florent Altché · Yunhao Tang · Thomas Hubert · Michal Valko · Ioannis Antonoglou · Remi Munos -
2020 Poster: Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards »
Aadirupa Saha · Pierre Gaillard · Michal Valko -
2020 Poster: Gamification of Pure Exploration for Linear Bandits »
Rémy Degenne · Pierre Menard · Xuedong Shang · Michal Valko -
2020 Poster: Stochastic bandits with arm-dependent delays »
Anne Gael Manegueu · Claire Vernade · Alexandra Carpentier · Michal Valko -
2020 Poster: Budgeted Online Influence Maximization »
Pierre Perrault · Jennifer Healey · Zheng Wen · Michal Valko -
2020 Poster: Near-linear time Gaussian process optimization with adaptive batching and resparsification »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2020 Poster: Taylor Expansion Policy Optimization »
Yunhao Tang · Michal Valko · Remi Munos