Timezone: »
Reinforcement learning aims at autonomously performing complex tasks. To this end, a reward signal is used to steer the learning process. While successful in many circumstances, the approach is typically data hungry, requiring large amounts of task-specific interaction between agent and environment to learn efficient behaviors. To alleviate this, unsupervised reinforcement learning proposes to collect data through self-supervised interaction to accelerate task-specific adaptation. However, whether current unsupervised strategies lead to improved generalization capabilities is still unclear, more so when the input observations are high-dimensional. In this work, we advance the field by closing the performance gap in the Unsupervised Reinforcement Learning Benchmark, a collection of tasks to be solved in a data-efficient manner, after interacting with the environment in a self-supervised way. Our model-based approach combines exploration and planning to efficiently fine-tune unsupervised pre-trained models, achieving comparable results to task-specific baselines. We extensively evaluate our work, comparing several exploration methods and improving fine-tuning by studying the interaction between the model components. Furthermore, we investigate the limits of the learned model and the unsupervised methods to gain insights into how these influence the decision process, shedding light on new research directions.
Author Information
Sai Rajeswar (University of Montreal)
Pietro Mazzaglia (Ghent University)
Tim Verbelen (Ghent University - imec)
Alex Piche (Mila)
Bart Dhoedt (Ghent University)
Aaron Courville (University of Montreal)
Alexandre Lacoste (Element AI)
More from the Same Authors
-
2021 : Gradient Starvation: A Learning Proclivity in Neural Networks »
Mohammad Pezeshki · Sékou-Oumar Kaba · Yoshua Bengio · Aaron Courville · Doina Precup · Guillaume Lajoie -
2023 : Do as your neighbors: Invariant learning through non-parametric neighbourhood matching »
Andrei Nicolicioiu · Jerry Huang · Dhanya Sridhar · Aaron Courville -
2023 : Learning with Learning Awareness using Meta-Values »
Tim Cooijmans · Milad Aghajohari · Aaron Courville -
2023 : Causal Discovery with Language Models as Imperfect Experts »
Stephanie Long · Alex Piche · Valentina Zantedeschi · Tibor Schuster · Alexandre Drouin -
2023 : Inferring Hierarchical Structure in Multi-Room Maze Environments »
Daria de Tinguy · Toon Van de Maele · Tim Verbelen · Bart Dhoedt -
2023 Oral: Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels »
Sai Rajeswar · Pietro Mazzaglia · Tim Verbelen · Alex Piche · Bart Dhoedt · Aaron Courville · Alexandre Lacoste -
2023 Poster: Bigger, Better, Faster: Human-level Atari with human-level efficiency »
Max Schwarzer · Johan Obando Ceron · Aaron Courville · Marc Bellemare · Rishabh Agarwal · Pablo Samuel Castro -
2023 Poster: Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels »
Sai Rajeswar · Pietro Mazzaglia · Tim Verbelen · Alex Piche · Bart Dhoedt · Aaron Courville · Alexandre Lacoste -
2019 Poster: On the Spectral Bias of Neural Networks »
Nasim Rahaman · Aristide Baratin · Devansh Arpit · Felix Draxler · Min Lin · Fred Hamprecht · Yoshua Bengio · Aaron Courville -
2019 Oral: On the Spectral Bias of Neural Networks »
Nasim Rahaman · Aristide Baratin · Devansh Arpit · Felix Draxler · Min Lin · Fred Hamprecht · Yoshua Bengio · Aaron Courville -
2018 Poster: Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data »
Amjad Almahairi · Sai Rajeswar · Alessandro Sordoni · Philip Bachman · Aaron Courville -
2018 Poster: Mutual Information Neural Estimation »
Mohamed Belghazi · Aristide Baratin · Sai Rajeswar · Sherjil Ozair · Yoshua Bengio · R Devon Hjelm · Aaron Courville -
2018 Oral: Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data »
Amjad Almahairi · Sai Rajeswar · Alessandro Sordoni · Philip Bachman · Aaron Courville -
2018 Oral: Mutual Information Neural Estimation »
Mohamed Belghazi · Aristide Baratin · Sai Rajeswar · Sherjil Ozair · Yoshua Bengio · R Devon Hjelm · Aaron Courville -
2017 Poster: A Closer Look at Memorization in Deep Networks »
David Krueger · Yoshua Bengio · Stanislaw Jastrzebski · Maxinder S. Kanwal · Nicolas Ballas · Asja Fischer · Emmanuel Bengio · Devansh Arpit · Tegan Maharaj · Aaron Courville · Simon Lacoste-Julien -
2017 Talk: A Closer Look at Memorization in Deep Networks »
David Krueger · Yoshua Bengio · Stanislaw Jastrzebski · Maxinder S. Kanwal · Nicolas Ballas · Asja Fischer · Emmanuel Bengio · Devansh Arpit · Tegan Maharaj · Aaron Courville · Simon Lacoste-Julien