Timezone: »
This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effect as the primacy bias. Through a series of experiments, we dissect the algorithmic aspects of deep RL that exacerbate this bias. We then propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent. We apply this mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains, consistently improving their performance.
Author Information
Evgenii Nikishin (Mila, Université de Montréal)
Max Schwarzer (Mila, Google Brain)
Pierluca D'Oro (Mila, Université de Montréal)
Pierre-Luc Bacon (Mila)
Aaron Courville (Université de Montréal)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: The Primacy Bias in Deep Reinforcement Learning »
Tue. Jul 19th 06:50 -- 06:55 PM Room Room 309
More from the Same Authors
-
2021 : Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation »
Evgenii Nikishin · Romina Abachi · Rishabh Agarwal · Pierre-Luc Bacon -
2023 Poster: Understanding Plasticity in Neural Networks »
Clare Lyle · Zeyu Zheng · Evgenii Nikishin · Bernardo Avila Pires · Razvan Pascanu · Will Dabney -
2023 Poster: Bigger, Better, Faster: Human-level Atari with human-level efficiency »
Max Schwarzer · Johan Obando Ceron · Aaron Courville · Marc Bellemare · Rishabh Agarwal · Pablo Samuel Castro -
2023 Oral: Understanding Plasticity in Neural Networks »
Clare Lyle · Zeyu Zheng · Evgenii Nikishin · Bernardo Avila Pires · Razvan Pascanu · Will Dabney -
2022 Workshop: Decision Awareness in Reinforcement Learning »
Evgenii Nikishin · Pierluca D'Oro · Doina Precup · Andre Barreto · Amir-massoud Farahmand · Pierre-Luc Bacon -
2022 Poster: Direct Behavior Specification via Constrained Reinforcement Learning »
Julien Roy · Roger Girgis · Joshua Romoff · Pierre-Luc Bacon · Christopher Pal -
2022 Poster: Building Robust Ensembles via Margin Boosting »
Dinghuai Zhang · Hongyang Zhang · Aaron Courville · Yoshua Bengio · Pradeep Ravikumar · Arun Sai Suggala -
2022 Spotlight: Direct Behavior Specification via Constrained Reinforcement Learning »
Julien Roy · Roger Girgis · Joshua Romoff · Pierre-Luc Bacon · Christopher Pal -
2022 Spotlight: Building Robust Ensembles via Margin Boosting »
Dinghuai Zhang · Hongyang Zhang · Aaron Courville · Yoshua Bengio · Pradeep Ravikumar · Arun Sai Suggala -
2022 Poster: Generative Flow Networks for Discrete Probabilistic Modeling »
Dinghuai Zhang · Nikolay Malkin · Zhen Liu · Alexandra Volokhova · Aaron Courville · Yoshua Bengio -
2022 Spotlight: Generative Flow Networks for Discrete Probabilistic Modeling »
Dinghuai Zhang · Nikolay Malkin · Zhen Liu · Alexandra Volokhova · Aaron Courville · Yoshua Bengio -
2021 Poster: Can Subnetwork Structure Be the Key to Out-of-Distribution Generalization? »
Dinghuai Zhang · Kartik Ahuja · Yilun Xu · Yisen Wang · Aaron Courville -
2021 Oral: Can Subnetwork Structure Be the Key to Out-of-Distribution Generalization? »
Dinghuai Zhang · Kartik Ahuja · Yilun Xu · Yisen Wang · Aaron Courville -
2021 Poster: Continuous Coordination As a Realistic Scenario for Lifelong Learning »
Hadi Nekoei · Akilesh Badrinaaraayanan · Aaron Courville · Sarath Chandar -
2021 Spotlight: Continuous Coordination As a Realistic Scenario for Lifelong Learning »
Hadi Nekoei · Akilesh Badrinaaraayanan · Aaron Courville · Sarath Chandar -
2021 Poster: Out-of-Distribution Generalization via Risk Extrapolation (REx) »
David Krueger · Ethan Caballero · Joern-Henrik Jacobsen · Amy Zhang · Jonathan Binas · Dinghuai Zhang · Remi Le Priol · Aaron Courville -
2021 Oral: Out-of-Distribution Generalization via Risk Extrapolation (REx) »
David Krueger · Ethan Caballero · Joern-Henrik Jacobsen · Amy Zhang · Jonathan Binas · Dinghuai Zhang · Remi Le Priol · Aaron Courville -
2020 Poster: AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation »
Jae Hyun Lim · Aaron Courville · Christopher Pal · Chin-Wei Huang -
2020 Poster: Countering Language Drift with Seeded Iterated Learning »
Yuchen Lu · Soumye Singhal · Florian Strub · Aaron Courville · Olivier Pietquin -
2019 Workshop: Invertible Neural Networks and Normalizing Flows »
Chin-Wei Huang · David Krueger · Rianne Van den Berg · George Papamakarios · Aidan Gomez · Chris Cremer · Aaron Courville · Ricky T. Q. Chen · Danilo J. Rezende -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 Poster: Hierarchical Importance Weighted Autoencoders »
Chin-Wei Huang · Kris Sankaran · Eeshan Dhekane · Alexandre Lacoste · Aaron Courville -
2019 Oral: Hierarchical Importance Weighted Autoencoders »
Chin-Wei Huang · Kris Sankaran · Eeshan Dhekane · Alexandre Lacoste · Aaron Courville