Timezone: »
In this paper, we propose a novel setting for Inverse Reinforcement Learning (IRL), namely "Learning from a Learner" (LfL). As opposed to standard IRL, it does not consist in learning a reward by observing an optimal agent but from observations of another learning (and thus sub-optimal) agent. To do so, we leverage the fact that the observed agent's policy is assumed to improve over time. The ultimate goal of this approach is to recover the actual environment's reward and to allow the observer to outperform the learner. To recover that reward in practice, we propose methods based on the entropy-regularized policy iteration framework. We discuss different approaches to learn solely from trajectories in the state-action space. We demonstrate the genericity of our method by observing agents implementing various reinforcement learning algorithms. Finally, we show that, on both discrete and continuous state/action tasks, the observer's performance (that optimizes the recovered reward) can surpass those of the observed agent.
Author Information
alexis jacq (EPFL)
Matthieu Geist (Google)
Ana Paiva (INESC-ID U of Lisbon)
Olivier Pietquin (GOOGLE BRAIN)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Learning from a Learner »
Wed. Jun 12th 01:30 -- 04:00 AM Room Pacific Ballroom #110
More from the Same Authors
-
2021 : A functional mirror ascent view of policy gradient methods with function approximation »
Sharan Vaswani · Olivier Bachem · Simone Totaro · Matthieu Geist · Marlos C. Machado · Pablo Samuel Castro · Nicolas Le Roux -
2021 : Offline Reinforcement Learning as Anti-Exploration »
Shideh Rezaeifar · Robert Dadashi · Nino Vieillard · Léonard Hussenot · Olivier Bachem · Olivier Pietquin · Matthieu Geist -
2023 Poster: A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning »
Benjamin Eysenbach · Matthieu Geist · Ruslan Salakhutdinov · Sergey Levine -
2023 Poster: Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice »
Toshinori Kitamura · Tadashi Kozuno · Yunhao Tang · Nino Vieillard · Michal Valko · Wenhao Yang · Jincheng Mei · Pierre Menard · Mohammad Gheshlaghi Azar · Remi Munos · Olivier Pietquin · Matthieu Geist · Csaba Szepesvari · Wataru Kumagai · Yutaka Matsuo -
2023 Poster: Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games »
Batuhan Yardim · Semih Cayci · Matthieu Geist · Niao He -
2022 Poster: Large Batch Experience Replay »
Thibault Lahire · Matthieu Geist · Emmanuel Rachelson -
2022 Poster: Continuous Control with Action Quantization from Demonstrations »
Robert Dadashi · Léonard Hussenot · Damien Vincent · Sertan Girgin · Anton Raichuk · Matthieu Geist · Olivier Pietquin -
2022 Oral: Large Batch Experience Replay »
Thibault Lahire · Matthieu Geist · Emmanuel Rachelson -
2022 Spotlight: Continuous Control with Action Quantization from Demonstrations »
Robert Dadashi · Léonard Hussenot · Damien Vincent · Sertan Girgin · Anton Raichuk · Matthieu Geist · Olivier Pietquin -
2022 Poster: Geometric Multimodal Contrastive Representation Learning »
Petra Poklukar · Miguel Vasco · Hang Yin · Francisco S. Melo · Ana Paiva · Danica Kragic -
2022 Spotlight: Geometric Multimodal Contrastive Representation Learning »
Petra Poklukar · Miguel Vasco · Hang Yin · Francisco S. Melo · Ana Paiva · Danica Kragic -
2022 Poster: Scalable Deep Reinforcement Learning Algorithms for Mean Field Games »
Mathieu Lauriere · Sarah Perrin · Sertan Girgin · Paul Muller · Ayush Jain · Theophile Cabannes · Georgios Piliouras · Julien Perolat · Romuald Elie · Olivier Pietquin · Matthieu Geist -
2022 Spotlight: Scalable Deep Reinforcement Learning Algorithms for Mean Field Games »
Mathieu Lauriere · Sarah Perrin · Sertan Girgin · Paul Muller · Ayush Jain · Theophile Cabannes · Georgios Piliouras · Julien Perolat · Romuald Elie · Olivier Pietquin · Matthieu Geist -
2021 Poster: Hyperparameter Selection for Imitation Learning »
Léonard Hussenot · Marcin Andrychowicz · Damien Vincent · Robert Dadashi · Anton Raichuk · Sabela Ramos · Nikola Momchev · Sertan Girgin · Raphael Marinier · Lukasz Stafiniak · Emmanuel Orsini · Olivier Bachem · Matthieu Geist · Olivier Pietquin -
2021 Oral: Hyperparameter Selection for Imitation Learning »
Léonard Hussenot · Marcin Andrychowicz · Damien Vincent · Robert Dadashi · Anton Raichuk · Sabela Ramos · Nikola Momchev · Sertan Girgin · Raphael Marinier · Lukasz Stafiniak · Emmanuel Orsini · Olivier Bachem · Matthieu Geist · Olivier Pietquin -
2021 Poster: Offline Reinforcement Learning with Pseudometric Learning »
Robert Dadashi · Shideh Rezaeifar · Nino Vieillard · Léonard Hussenot · Olivier Pietquin · Matthieu Geist -
2021 Spotlight: Offline Reinforcement Learning with Pseudometric Learning »
Robert Dadashi · Shideh Rezaeifar · Nino Vieillard · Léonard Hussenot · Olivier Pietquin · Matthieu Geist -
2019 Poster: A Theory of Regularized Markov Decision Processes »
Matthieu Geist · Bruno Scherrer · Olivier Pietquin -
2019 Oral: A Theory of Regularized Markov Decision Processes »
Matthieu Geist · Bruno Scherrer · Olivier Pietquin