Timezone: »

 
Poster
Investigating the Role of Model-Based Learning in Exploration and Transfer
Jacob C Walker · Eszter Vértes · Yazhe Li · Gabriel Dulac-Arnold · Ankesh Anand · Theophane Weber · Jessica Hamrick

Tue Jul 25 05:00 PM -- 06:30 PM (PDT) @ Exhibit Hall 1 #204

State of the art reinforcement learning has enabled training agents on tasks of ever increasing complexity. However, the current paradigm tends to favor training agents from scratch on every new task or on collections of tasks with a view towards generalizing to novel task configurations. The former suffers from poor data efficiency while the latter is difficult when test tasks are out-of-distribution. Agents that can effectively transfer their knowledge about the world pose a potential solution to these issues. In this paper, we investigate transfer learning in the context of model-based agents. Specifically, we aim to understand where exactly environment models have an advantage and why. We find that a model-based approach outperforms controlled model-free baselines for transfer learning. Through ablations, we show that both the policy and dynamics model learnt through exploration matter for successful transfer. We demonstrate our results across three domains which vary in their requirements for transfer: in-distribution procedural (Crafter), in-distribution identical (RoboDesk), and out-of-distribution (Meta-World). Our results show that intrinsic exploration combined with environment models present a viable direction towards agents that are self-supervised and able to generalize to novel reward functions.

Author Information

Jacob C Walker (Google DeepMind)
Eszter Vértes (Google DeepMind)
Yazhe Li (Google Deepmind)
Gabriel Dulac-Arnold (Google Research)
Ankesh Anand (Google DeepMind)
Theophane Weber (Google DeepMind)
Jessica Hamrick (DeepMind)
Jessica Hamrick

Jessica Hamrick is a Senior Research Scientist at DeepMind, where she studies how to build machines that can flexibly build and deploy models of the world as well as humans. Her work combines insights from cognitive science with structured relational architectures, model-based deep reinforcement learning, and planning. Jessica received her Ph.D. in Psychology from UC Berkeley, and her M.Eng. in Computer Science and Engineering from MIT.

More from the Same Authors

  • 2021 : CoBERL: Contrastive BERT for Reinforcement Learning »
    Andrea Banino · Adrià Puigdomenech Badia · Jacob C Walker · Tim Scholtes · Jovana Mitrovic · Charles Blundell
  • 2022 : Learning to induce causal structure »
    Rosemary Nan Ke · Silvia Chiappa · Jane Wang · Jorg Bornschein · Anirudh Goyal · Melanie Rey · Matthew Botvinick · Theophane Weber · Michael Mozer · Danilo J. Rezende
  • 2023 Oral: Quantile Credit Assignment »
    Thomas Mesnard · Wenqi Chen · Alaa Saade · Yunhao Tang · Mark Rowland · Theophane Weber · Clare Lyle · Audrunas Gruslys · Michal Valko · Will Dabney · Georg Ostrovski · Eric Moulines · Remi Munos
  • 2023 Poster: Quantile Credit Assignment »
    Thomas Mesnard · Wenqi Chen · Alaa Saade · Yunhao Tang · Mark Rowland · Theophane Weber · Clare Lyle · Audrunas Gruslys · Michal Valko · Will Dabney · Georg Ostrovski · Eric Moulines · Remi Munos
  • 2022 Poster: Retrieval-Augmented Reinforcement Learning »
    Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · Adrià Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell
  • 2022 Spotlight: Retrieval-Augmented Reinforcement Learning »
    Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · Adrià Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell
  • 2022 Poster: Model-Value Inconsistency as a Signal for Epistemic Uncertainty »
    Angelos Filos · Eszter Vértes · Zita Marinho · Gregory Farquhar · Diana Borsa · Abe Friesen · Feryal Behbahani · Tom Schaul · Andre Barreto · Simon Osindero
  • 2022 Spotlight: Model-Value Inconsistency as a Signal for Epistemic Uncertainty »
    Angelos Filos · Eszter Vértes · Zita Marinho · Gregory Farquhar · Diana Borsa · Abe Friesen · Feryal Behbahani · Tom Schaul · Andre Barreto · Simon Osindero
  • 2021 Poster: Counterfactual Credit Assignment in Model-Free Reinforcement Learning »
    Thomas Mesnard · Theophane Weber · Fabio Viola · Shantanu Thakoor · Alaa Saade · Anna Harutyunyan · Will Dabney · Thomas Stepleton · Nicolas Heess · Arthur Guez · Eric Moulines · Marcus Hutter · Lars Buesing · Remi Munos
  • 2021 Poster: Muesli: Combining Improvements in Policy Optimization »
    Matteo Hessel · Ivo Danihelka · Fabio Viola · Arthur Guez · Simon Schmitt · Laurent Sifre · Theophane Weber · David Silver · Hado van Hasselt
  • 2021 Spotlight: Counterfactual Credit Assignment in Model-Free Reinforcement Learning »
    Thomas Mesnard · Theophane Weber · Fabio Viola · Shantanu Thakoor · Alaa Saade · Anna Harutyunyan · Will Dabney · Thomas Stepleton · Nicolas Heess · Arthur Guez · Eric Moulines · Marcus Hutter · Lars Buesing · Remi Munos
  • 2021 Spotlight: Muesli: Combining Improvements in Policy Optimization »
    Matteo Hessel · Ivo Danihelka · Fabio Viola · Arthur Guez · Simon Schmitt · Laurent Sifre · Theophane Weber · David Silver · Hado van Hasselt
  • 2020 Workshop: Inductive Biases, Invariances and Generalization in Reinforcement Learning »
    Anirudh Goyal · Rosemary Nan Ke · Jane Wang · Stefan Bauer · Theophane Weber · Fabio Viola · Bernhard Schölkopf · Stefan Bauer
  • 2020 Workshop: Graph Representation Learning and Beyond (GRL+) »
    Petar Veličković · Michael M. Bronstein · Andreea Deac · Will Hamilton · Jessica Hamrick · Milad Hashemi · Stefanie Jegelka · Jure Leskovec · Renjie Liao · Federico Monti · Yizhou Sun · Kevin Swersky · Rex (Zhitao) Ying · Marinka Zitnik
  • 2020 Tutorial: Model-Based Methods in Reinforcement Learning »
    Igor Mordatch · Jessica Hamrick
  • 2019 : posters »
    Zhengxing Chen · Juan Jose Garau Luis · Ignacio Albert Smet · Aditya Modi · Sabina Tomkins · Riley Simmons-Edler · Hongzi Mao · Alexander Irpan · Hao Lu · Rose Wang · Subhojyoti Mukherjee · Aniruddh Raghu · Syed Arbab Mohd Shihab · Byung Hoon Ahn · Rasool Fakoor · Pratik Chaudhari · Elena Smirnova · Min-hwan Oh · Xiaocheng Tang · Tony Qin · Qingyang Li · Marc Brittain · Ian Fox · Supratik Paul · Xiaofeng Gao · Yinlam Chow · Gabriel Dulac-Arnold · Ofir Nachum · Nikos Karampatziakis · Bharathan Balaji · Supratik Paul · Ali Davody · Djallel Bouneffouf · Himanshu Sahni · Soo Kim · Andrey Kolobov · Alexander Amini · Yao Liu · Xinshi Chen · · Craig Boutilier
  • 2019 Poster: An Investigation of Model-Free Planning »
    Arthur Guez · Mehdi Mirza · Karol Gregor · Rishabh Kabra · Sebastien Racaniere · Theophane Weber · David Raposo · Adam Santoro · Laurent Orseau · Tom Eccles · Greg Wayne · David Silver · Timothy Lillicrap
  • 2019 Oral: An Investigation of Model-Free Planning »
    Arthur Guez · Mehdi Mirza · Karol Gregor · Rishabh Kabra · Sebastien Racaniere · Theophane Weber · David Raposo · Adam Santoro · Laurent Orseau · Tom Eccles · Greg Wayne · David Silver · Timothy Lillicrap
  • 2018 Poster: Learning to search with MCTSnets »
    Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver
  • 2018 Oral: Learning to search with MCTSnets »
    Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver
  • 2017 Poster: The Predictron: End-To-End Learning and Planning »
    David Silver · Hado van Hasselt · Matteo Hessel · Tom Schaul · Arthur Guez · Tim Harley · Gabriel Dulac-Arnold · David Reichert · Neil Rabinowitz · Andre Barreto · Thomas Degris
  • 2017 Talk: The Predictron: End-To-End Learning and Planning »
    David Silver · Hado van Hasselt · Matteo Hessel · Tom Schaul · Arthur Guez · Tim Harley · Gabriel Dulac-Arnold · David Reichert · Neil Rabinowitz · Andre Barreto · Thomas Degris