Timezone: »
Deep Reinforcement Learning (RL) agents have achieved superhuman performance on several video game suites. However, unlike humans the trained policies fail to transfer between related games or even between different levels of the same game. Recent works have shown that ideas such as data augmentation and learning domain invariant features can reduce this generalization gap. However, the transfer performance still remains unsatisfactory. In this work we use procedurally generated video games to empirically investigate several hypotheses to explain the lack of transfer. Contrary to the belief that lack of generalizable visual features results in poor policy generalization, we find that visual features transfer across levels, but the inability to use these features to predict actions in new levels limits the overall transfer. We also show that simple auxiliary tasks can improve generalization and lead to policies that transfer as well as the state of the art methods using data augmentation. Finally, to inform fruitful avenues for future research, we construct simple oracle methods that close the generalization gap.
Author Information
Anurag Ajay (MIT)
Ge Yang (University of Chicago)
Ofir Nachum (Google Brain)
Pulkit Agrawal (MIT)
More from the Same Authors
-
2020 : Learning Action Priors for Visuomotor transfer »
Anurag Ajay -
2021 : Topological Experience Replay for Fast Q-Learning »
Zhang-Wei Hong · Tao Chen · Yen-Chen Lin · Joni Pajarinen · Pulkit Agrawal -
2021 : SparseDice: Imitation Learning for Temporally Sparse Data via Regularization »
Alberto Camacho · Izzeddin Gur · Marcin Moczulski · Ofir Nachum · Aleksandra Faust -
2021 : Topological Experience Replay for Fast Q-Learning »
Zhang-Wei Hong · Tao Chen · Yen-Chen Lin · Joni Pajarinen · Pulkit Agrawal -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2023 : Visual Dexterity: In-hand Dexterous Manipulation from Depth »
Tao Chen · Megha Tippur · Siyang Wu · Vikash Kumar · Edward Adelson · Pulkit Agrawal -
2023 : Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-loop feedback »
Marcel Torne Villasevil · Max Balsells I Pamies · Zihan Wang · Samedh Desai · Tao Chen · Pulkit Agrawal · Abhishek Gupta -
2023 Poster: Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation »
Zechu Li · Tao Chen · Zhang-Wei Hong · Anurag Ajay · Pulkit Agrawal -
2023 Poster: Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation »
Andi Peng · Aviv Netanyahu · Mark Ho · Tianmin Shu · Andreea Bobu · Julie Shah · Pulkit Agrawal -
2023 Poster: Statistical Learning under Heterogenous Distribution Shift »
Max Simchowitz · Anurag Ajay · Pulkit Agrawal · Akshay Krishnamurthy -
2023 Poster: Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks »
Minyoung Huh · Brian Cheung · Pulkit Agrawal · Phillip Isola -
2023 Poster: TGRL: An Algorithm for Teacher Guided Reinforcement Learning »
Idan Shenfeld · Zhang-Wei Hong · Aviv Tamar · Pulkit Agrawal -
2022 Poster: Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error »
Scott Fujimoto · David Meger · Doina Precup · Ofir Nachum · Shixiang Gu -
2022 Poster: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2022 Poster: Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning »
Aviv Netanyahu · Tianmin Shu · Josh Tenenbaum · Pulkit Agrawal -
2022 Spotlight: Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error »
Scott Fujimoto · David Meger · Doina Precup · Ofir Nachum · Shixiang Gu -
2022 Spotlight: Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning »
Aviv Netanyahu · Tianmin Shu · Josh Tenenbaum · Pulkit Agrawal -
2022 Spotlight: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2022 Poster: Offline RL Policies Should Be Trained to be Adaptive »
Dibya Ghosh · Anurag Ajay · Pulkit Agrawal · Sergey Levine -
2022 Oral: Offline RL Policies Should Be Trained to be Adaptive »
Dibya Ghosh · Anurag Ajay · Pulkit Agrawal · Sergey Levine -
2021 Workshop: Self-Supervised Learning for Reasoning and Perception »
Pengtao Xie · Shanghang Zhang · Ishan Misra · Pulkit Agrawal · Katerina Fragkiadaki · Ruisi Zhang · Tassilo Klein · Asli Celikyilmaz · Mihaela van der Schaar · Eric Xing -
2021 Poster: Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning »
Hiroki Furuta · Tatsuya Matsushima · Tadashi Kozuno · Yutaka Matsuo · Sergey Levine · Ofir Nachum · Shixiang Gu -
2021 Poster: Offline Reinforcement Learning with Fisher Divergence Critic Regularization »
Ilya Kostrikov · Rob Fergus · Jonathan Tompson · Ofir Nachum -
2021 Poster: Representation Matters: Offline Pretraining for Sequential Decision Making »
Mengjiao Yang · Ofir Nachum -
2021 Spotlight: Representation Matters: Offline Pretraining for Sequential Decision Making »
Mengjiao Yang · Ofir Nachum -
2021 Spotlight: Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning »
Hiroki Furuta · Tatsuya Matsushima · Tadashi Kozuno · Yutaka Matsuo · Sergey Levine · Ofir Nachum · Shixiang Gu -
2021 Spotlight: Offline Reinforcement Learning with Fisher Divergence Critic Regularization »
Ilya Kostrikov · Rob Fergus · Jonathan Tompson · Ofir Nachum -
2021 Poster: Learning Task Informed Abstractions »
Xiang Fu · Ge Yang · Pulkit Agrawal · Tommi Jaakkola -
2021 Poster: World Model as a Graph: Learning Latent Landmarks for Planning »
Lunjun Zhang · Ge Yang · Bradly Stadie -
2021 Oral: World Model as a Graph: Learning Latent Landmarks for Planning »
Lunjun Zhang · Ge Yang · Bradly Stadie -
2021 Spotlight: Learning Task Informed Abstractions »
Xiang Fu · Ge Yang · Pulkit Agrawal · Tommi Jaakkola -
2019 : posters »
Zhengxing Chen · Juan Jose Garau Luis · Ignacio Albert Smet · Aditya Modi · Sabina Tomkins · Riley Simmons-Edler · Hongzi Mao · Alexander Irpan · Hao Lu · Rose Wang · Subhojyoti Mukherjee · Aniruddh Raghu · Syed Arbab Mohd Shihab · Byung Hoon Ahn · Rasool Fakoor · Pratik Chaudhari · Elena Smirnova · Min-hwan Oh · Xiaocheng Tang · Tony Qin · Qingyang Li · Marc Brittain · Ian Fox · Supratik Paul · Xiaofeng Gao · Yinlam Chow · Gabriel Dulac-Arnold · Ofir Nachum · Nikos Karampatziakis · Bharathan Balaji · Supratik Paul · Ali Davody · Djallel Bouneffouf · Himanshu Sahni · Soo Kim · Andrey Kolobov · Alexander Amini · Yao Liu · Xinshi Chen · · Craig Boutilier -
2019 Poster: DeepMDP: Learning Continuous Latent Space Models for Representation Learning »
Carles Gelada · Saurabh Kumar · Jacob Buckman · Ofir Nachum · Marc Bellemare -
2019 Oral: DeepMDP: Learning Continuous Latent Space Models for Representation Learning »
Carles Gelada · Saurabh Kumar · Jacob Buckman · Ofir Nachum · Marc Bellemare -
2018 Poster: Smoothed Action Value Functions for Learning Gaussian Policies »
Ofir Nachum · Mohammad Norouzi · George Tucker · Dale Schuurmans -
2018 Poster: Investigating Human Priors for Playing Video Games »
Rachit Dubey · Pulkit Agrawal · Deepak Pathak · Tom Griffiths · Alexei Efros -
2018 Oral: Smoothed Action Value Functions for Learning Gaussian Policies »
Ofir Nachum · Mohammad Norouzi · George Tucker · Dale Schuurmans -
2018 Oral: Investigating Human Priors for Playing Video Games »
Rachit Dubey · Pulkit Agrawal · Deepak Pathak · Tom Griffiths · Alexei Efros -
2018 Poster: Path Consistency Learning in Tsallis Entropy Regularized MDPs »
Yinlam Chow · Ofir Nachum · Mohammad Ghavamzadeh -
2018 Oral: Path Consistency Learning in Tsallis Entropy Regularized MDPs »
Yinlam Chow · Ofir Nachum · Mohammad Ghavamzadeh -
2017 Poster: Curiosity-driven Exploration by Self-supervised Prediction »
Deepak Pathak · Pulkit Agrawal · Alexei Efros · Trevor Darrell -
2017 Talk: Curiosity-driven Exploration by Self-supervised Prediction »
Deepak Pathak · Pulkit Agrawal · Alexei Efros · Trevor Darrell