Timezone: »
How can an artificial agent learn to solve a wide range of tasks in a complex visual environment in the absence of external supervision? We decompose this question into two problems, global exploration of the environment and learning to reliably reach situations found during exploration. We introduce the Explore Achieve Network (ExaNet), a unified solution to these by learning a world model from the high-dimensional images and using it to train an explorer and an achiever policy from imagined trajectories. Unlike prior methods that explore by reaching previously visited states, our explorer plans to discover unseen surprising states through foresight, which are then used as diverse targets for the achiever. After the unsupervised phase, ExaNet solves tasks specified by goal images without any additional learning. We introduce a challenging benchmark spanning across four standard robotic manipulation and locomotion domains with a total of over 40 test tasks. Our agent substantially outperforms previous approaches to unsupervised goal reaching and achieves goals that require interacting with multiple objects in sequence. Finally, to demonstrate the scalability and generality of our approach, we train a single general agent across four distinct environments. For videos, see https://sites.google.com/view/exanet/home.
Author Information
Russell Mendonca (Carnegie Mellon University)
Oleh Rybkin (University of Pennsylvania)
Oleg is a Ph.D. student in the GRASP laboratory at the University of Pennsylvania advised by Kostas Daniilidis. He received his Bachelor's degree from Czech Technical University in Prague. He is interested in deep learning and computer vision, and, more specifically, on using deep predictive models to discover semantic structure in video as well as applications of these models for planning. Prior to his Ph.D. studies, he worked on camera geometry as an undergraduate researcher advised by Tomas Pajdla. He was a visiting student researcher at INRIA advised by Josef Sivic, Tokyo Institute of Technology advised by Akihiko Torii, and UC Berkeley advised by Sergey Levine.
Kostas Daniilidis (University of Pennsylvania)
Danijar Hafner (Google Brain & University of Toronto)
Deepak Pathak (CMU, FAIR)
More from the Same Authors
-
2020 : Evaluating Agents without Rewards »
Danijar Hafner -
2021 : Intrinsic Control of Variational Beliefs in Dynamic Partially-Observed Visual Environments »
Nicholas Rhinehart · Jenny Wang · Glen Berseth · John Co-Reyes · Danijar Hafner · Chelsea Finn · Sergey Levine -
2023 : Internet Explorer: Targeted Representation Learning on the Open Web »
Alexander Li · Ellis Brown · Alexei Efros · Deepak Pathak -
2023 : Your Diffusion Model is Secretly a Zero-Shot Classifier »
Alexander Li · Mihir Prabhudesai · Shivam Duggal · Ellis Brown · Deepak Pathak -
2023 : Test-time Adaptation with Diffusion Models »
Mihir Prabhudesai · Tsung-Wei Ke · Alexander Li · Deepak Pathak · Katerina Fragkiadaki -
2023 Poster: Efficient RL via Disentangled Environment and Agent Representations »
Kevin Gmelin · Shikhar Bahl · Russell Mendonca · Deepak Pathak -
2023 Poster: Temporally Consistent Transformers for Video Generation »
Wilson Yan · Danijar Hafner · Stephen James · Pieter Abbeel -
2023 Oral: Efficient RL via Disentangled Environment and Agent Representations »
Kevin Gmelin · Shikhar Bahl · Russell Mendonca · Deepak Pathak -
2023 Poster: Internet Explorer: Targeted Representation Learning on the Open Web »
Alexander Li · Ellis Brown · Alexei Efros · Deepak Pathak -
2023 Poster: Test-time Adaptation with Slot-Centric Models »
Mihir Prabhudesai · Anirudh Goyal · Sujoy Paul · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Gaurav Aggarwal · Thomas Kipf · Deepak Pathak · Katerina Fragkiadaki -
2022 Poster: Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents »
Wenlong Huang · Pieter Abbeel · Deepak Pathak · Igor Mordatch -
2022 Spotlight: Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents »
Wenlong Huang · Pieter Abbeel · Deepak Pathak · Igor Mordatch -
2022 Poster: Zero-Shot Reward Specification via Grounded Natural Language »
Parsa Mahmoudieh · Deepak Pathak · Trevor Darrell -
2022 Poster: REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer »
Xingyu Liu · Deepak Pathak · Kris Kitani -
2022 Spotlight: Zero-Shot Reward Specification via Grounded Natural Language »
Parsa Mahmoudieh · Deepak Pathak · Trevor Darrell -
2022 Oral: REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer »
Xingyu Liu · Deepak Pathak · Kris Kitani -
2022 Poster: Unified Fourier-based Kernel and Nonlinearity Design for Equivariant Networks on Homogeneous Spaces »
Yinshuang Xu · Jiahui Lei · Edgar Dobriban · Kostas Daniilidis -
2022 Spotlight: Unified Fourier-based Kernel and Nonlinearity Design for Equivariant Networks on Homogeneous Spaces »
Yinshuang Xu · Jiahui Lei · Edgar Dobriban · Kostas Daniilidis -
2021 : Panel Discussion »
Rosemary Nan Ke · Danijar Hafner · Pieter Abbeel · Chelsea Finn · Chelsea Finn -
2021 : Oral Presentation: Discovering and Achieving Goals with World Models »
Oleh Rybkin · Deepak Pathak -
2021 : Invited Talk by Danijar Hafner »
Danijar Hafner -
2021 Poster: Simple and Effective VAE Training with Calibrated Decoders »
Oleh Rybkin · Kostas Daniilidis · Sergey Levine -
2021 Spotlight: Simple and Effective VAE Training with Calibrated Decoders »
Oleh Rybkin · Kostas Daniilidis · Sergey Levine -
2021 Poster: Differentiable Spatial Planning using Transformers »
Devendra Singh Chaplot · Deepak Pathak · Jitendra Malik -
2021 Spotlight: Differentiable Spatial Planning using Transformers »
Devendra Singh Chaplot · Deepak Pathak · Jitendra Malik -
2021 Poster: Unsupervised Learning of Visual 3D Keypoints for Control »
Boyuan Chen · Pieter Abbeel · Deepak Pathak -
2021 Poster: Model-Based Reinforcement Learning via Latent-Space Collocation »
Oleh Rybkin · Chuning Zhu · Anusha Nagabandi · Kostas Daniilidis · Igor Mordatch · Sergey Levine -
2021 Spotlight: Model-Based Reinforcement Learning via Latent-Space Collocation »
Oleh Rybkin · Chuning Zhu · Anusha Nagabandi · Kostas Daniilidis · Igor Mordatch · Sergey Levine -
2021 Spotlight: Unsupervised Learning of Visual 3D Keypoints for Control »
Boyuan Chen · Pieter Abbeel · Deepak Pathak -
2020 Poster: One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control »
Wenlong Huang · Igor Mordatch · Deepak Pathak -
2020 Poster: Planning to Explore via Self-Supervised World Models »
Ramanan Sekar · Oleh Rybkin · Kostas Daniilidis · Pieter Abbeel · Danijar Hafner · Deepak Pathak -
2019 Poster: Cross-Domain 3D Equivariant Image Embeddings »
Carlos Esteves · Avneesh Sud · Zhengyi Luo · Kostas Daniilidis · Ameesh Makadia -
2019 Poster: Learning Latent Dynamics for Planning from Pixels »
Danijar Hafner · Timothy Lillicrap · Ian Fischer · Ruben Villegas · David Ha · Honglak Lee · James Davidson -
2019 Oral: Cross-Domain 3D Equivariant Image Embeddings »
Carlos Esteves · Avneesh Sud · Zhengyi Luo · Kostas Daniilidis · Ameesh Makadia -
2019 Oral: Learning Latent Dynamics for Planning from Pixels »
Danijar Hafner · Timothy Lillicrap · Ian Fischer · Ruben Villegas · David Ha · Honglak Lee · James Davidson -
2019 Poster: Self-Supervised Exploration via Disagreement »
Deepak Pathak · Dhiraj Gandhi · Abhinav Gupta -
2019 Oral: Self-Supervised Exploration via Disagreement »
Deepak Pathak · Dhiraj Gandhi · Abhinav Gupta -
2018 Poster: Investigating Human Priors for Playing Video Games »
Rachit Dubey · Pulkit Agrawal · Deepak Pathak · Tom Griffiths · Alexei Efros -
2018 Oral: Investigating Human Priors for Playing Video Games »
Rachit Dubey · Pulkit Agrawal · Deepak Pathak · Tom Griffiths · Alexei Efros -
2017 Poster: Curiosity-driven Exploration by Self-supervised Prediction »
Deepak Pathak · Pulkit Agrawal · Alexei Efros · Trevor Darrell -
2017 Talk: Curiosity-driven Exploration by Self-supervised Prediction »
Deepak Pathak · Pulkit Agrawal · Alexei Efros · Trevor Darrell