Timezone: »
Reinforcement learning allows solving complex tasks, however, the learning tends to be task-specific and the sample efficiency remains a challenge. We present Plan2Explore, a self-supervised reinforcement learning agent that tackles both these challenges through a new approach to self-supervised exploration and fast adaptation to new tasks, which need not be known during exploration. During exploration, unlike prior methods which retrospectively compute the novelty of observations after the agent has already reached them, our agent acts efficiently by leveraging planning to seek out expected future novelty. After exploration, the agent quickly adapts to multiple downstream tasks in a zero or a few-shot manner. We evaluate on challenging control tasks from high-dimensional image inputs. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods, and in fact, almost matches the performances oracle which has access to rewards. Videos and code: https://ramanans1.github.io/plan2explore/
Author Information
Ramanan Sekar (University of Pennsylvania)
Oleh Rybkin (University of Pennsylvania)
Oleg is a Ph.D. student in the GRASP laboratory at the University of Pennsylvania advised by Kostas Daniilidis. He received his Bachelor's degree from Czech Technical University in Prague. He is interested in deep learning and computer vision, and, more specifically, on using deep predictive models to discover semantic structure in video as well as applications of these models for planning. Prior to his Ph.D. studies, he worked on camera geometry as an undergraduate researcher advised by Tomas Pajdla. He was a visiting student researcher at INRIA advised by Josef Sivic, Tokyo Institute of Technology advised by Akihiko Torii, and UC Berkeley advised by Sergey Levine.
Kostas Daniilidis (University of Pennsylvania)
Pieter Abbeel (UC Berkeley & Covariant)
Danijar Hafner (Google Brain & University of Toronto)
Deepak Pathak (CMU, FAIR)
More from the Same Authors
-
2020 : Evaluating Agents without Rewards »
Danijar Hafner -
2020 : Planning to Explore via Self-Supervised World Models »
Ramanan Sekar -
2021 : Discovering and Achieving Goals with World Models »
Russell Mendonca · Oleh Rybkin · Kostas Daniilidis · Danijar Hafner · Deepak Pathak -
2021 : Decision Transformer: Reinforcement Learning via Sequence Modeling »
Lili Chen · Kevin Lu · Aravind Rajeswaran · Kimin Lee · Aditya Grover · Michael Laskin · Pieter Abbeel · Aravind Srinivas · Igor Mordatch -
2021 : Data-Efficient Exploration with Self Play for Atari »
Michael Laskin · Catherine Cang · Ryan Rudes · Pieter Abbeel -
2021 : Intrinsic Control of Variational Beliefs in Dynamic Partially-Observed Visual Environments »
Nicholas Rhinehart · Jenny Wang · Glen Berseth · John Co-Reyes · Danijar Hafner · Chelsea Finn · Sergey Levine -
2021 : Hierarchical Few-Shot Imitation with Skill Transition Models »
kourosh hakhamaneshi · Ruihan Zhao · Albert Zhan · Pieter Abbeel · Michael Laskin -
2021 : Decision Transformer: Reinforcement Learning via Sequence Modeling »
Lili Chen · Kevin Lu · Aravind Rajeswaran · Kimin Lee · Aditya Grover · Michael Laskin · Pieter Abbeel · Aravind Srinivas · Igor Mordatch -
2021 : Explaining Reinforcement Learning Policies through Counterfactual Trajectories »
Julius Frost · Olivia Watkins · Eric Weiner · Pieter Abbeel · Trevor Darrell · Bryan Plummer · Kate Saenko -
2022 : Multimodal Masked Autoencoders Learn Transferable Representations »
Xinyang Geng · Hao Liu · Lisa Lee · Dale Schuurmans · Sergey Levine · Pieter Abbeel -
2023 Poster: Internet Explorer: Targeted Representation Learning on the Open Web »
Alexander Li · Ellis Brown · Alexei Efros · Deepak Pathak -
2023 Poster: Masked Trajectory Models for Prediction, Representation, and Control »
Philipp Wu · Arjun Majumdar · Kevin Stone · Yixin Lin · Igor Mordatch · Pieter Abbeel · Aravind Rajeswaran -
2023 Poster: Test-time Adaptation with Slot-Centric Models »
Mihir Prabhudesai · Anirudh Goyal · Sujoy Paul · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Gaurav Aggarwal · Thomas Kipf · Deepak Pathak · Katerina Fragkiadaki -
2023 Poster: The Wisdom of Hindsight Makes Language Models Better Instruction Followers »
Tianjun Zhang · Fangchen Liu · Justin Wong · Pieter Abbeel · Joseph E Gonzalez -
2023 Poster: Efficient RL via Disentangled Environment and Agent Representations »
Kevin Gmelin · Shikhar Bahl · Russell Mendonca · Deepak Pathak -
2023 Poster: Guiding Pretraining in Reinforcement Learning with Large Language Models »
Yuqing Du · Olivia Watkins · Zihan Wang · Cédric Colas · Trevor Darrell · Pieter Abbeel · Abhishek Gupta · Jacob Andreas -
2023 Poster: Emergent Agentic Transformer from Chain of Hindsight Experience »
Hao Liu · Pieter Abbeel -
2023 Poster: Temporally Consistent Transformers for Video Generation »
Wilson Yan · Danijar Hafner · Stephen James · Pieter Abbeel -
2023 Poster: CLUTR: Curriculum Learning via Unsupervised Task Representation Learning »
Abdus Salam Azad · Izzeddin Gur · Jasper Emhoff · Nathaniel Alexis · Aleksandra Faust · Pieter Abbeel · Ion Stoica -
2023 Poster: Controllability-Aware Unsupervised Skill Discovery »
Seohong Park · Kimin Lee · Youngwoon Lee · Pieter Abbeel -
2023 Poster: Multi-Environment Pretraining Enables Transfer to Action Limited Datasets »
David Venuto · Mengjiao Yang · Pieter Abbeel · Doina Precup · Igor Mordatch · Ofir Nachum -
2023 Poster: Multi-View Masked World Models for Visual Robotic Manipulation »
Younggyo Seo · Junsu Kim · Stephen James · Kimin Lee · Jinwoo Shin · Pieter Abbeel -
2023 Oral: Efficient RL via Disentangled Environment and Agent Representations »
Kevin Gmelin · Shikhar Bahl · Russell Mendonca · Deepak Pathak -
2022 : Multimodal Masked Autoencoders Learn Transferable Representations »
Xinyang Geng · Hao Liu · Lisa Lee · Dale Schuurmans · Sergey Levine · Pieter Abbeel -
2022 Poster: Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks »
Litian Liang · Yaosheng Xu · Stephen Mcaleer · Dailin Hu · Alexander Ihler · Pieter Abbeel · Roy Fox -
2022 Poster: Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents »
Wenlong Huang · Pieter Abbeel · Deepak Pathak · Igor Mordatch -
2022 Spotlight: Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks »
Litian Liang · Yaosheng Xu · Stephen Mcaleer · Dailin Hu · Alexander Ihler · Pieter Abbeel · Roy Fox -
2022 Spotlight: Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents »
Wenlong Huang · Pieter Abbeel · Deepak Pathak · Igor Mordatch -
2022 Poster: Reinforcement Learning with Action-Free Pre-Training from Videos »
Younggyo Seo · Kimin Lee · Stephen James · Pieter Abbeel -
2022 Poster: Zero-Shot Reward Specification via Grounded Natural Language »
Parsa Mahmoudieh · Deepak Pathak · Trevor Darrell -
2022 Poster: REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer »
Xingyu Liu · Deepak Pathak · Kris Kitani -
2022 Spotlight: Reinforcement Learning with Action-Free Pre-Training from Videos »
Younggyo Seo · Kimin Lee · Stephen James · Pieter Abbeel -
2022 Spotlight: Zero-Shot Reward Specification via Grounded Natural Language »
Parsa Mahmoudieh · Deepak Pathak · Trevor Darrell -
2022 Oral: REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer »
Xingyu Liu · Deepak Pathak · Kris Kitani -
2022 Poster: Unified Fourier-based Kernel and Nonlinearity Design for Equivariant Networks on Homogeneous Spaces »
Yinshuang Xu · Jiahui Lei · Edgar Dobriban · Kostas Daniilidis -
2022 Spotlight: Unified Fourier-based Kernel and Nonlinearity Design for Equivariant Networks on Homogeneous Spaces »
Yinshuang Xu · Jiahui Lei · Edgar Dobriban · Kostas Daniilidis -
2021 : Panel Discussion »
Rosemary Nan Ke · Danijar Hafner · Pieter Abbeel · Chelsea Finn · Chelsea Finn -
2021 : Invited Talk by Pieter Abbeel »
Pieter Abbeel -
2021 : Oral Presentation: Discovering and Achieving Goals with World Models »
Oleh Rybkin · Deepak Pathak -
2021 : Invited Talk by Danijar Hafner »
Danijar Hafner -
2021 Poster: Simple and Effective VAE Training with Calibrated Decoders »
Oleh Rybkin · Kostas Daniilidis · Sergey Levine -
2021 Spotlight: Simple and Effective VAE Training with Calibrated Decoders »
Oleh Rybkin · Kostas Daniilidis · Sergey Levine -
2021 Poster: Decoupling Representation Learning from Reinforcement Learning »
Adam Stooke · Kimin Lee · Pieter Abbeel · Michael Laskin -
2021 Spotlight: Decoupling Representation Learning from Reinforcement Learning »
Adam Stooke · Kimin Lee · Pieter Abbeel · Michael Laskin -
2021 Poster: APS: Active Pretraining with Successor Features »
Hao Liu · Pieter Abbeel -
2021 Poster: SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning »
Kimin Lee · Michael Laskin · Aravind Srinivas · Pieter Abbeel -
2021 Spotlight: SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning »
Kimin Lee · Michael Laskin · Aravind Srinivas · Pieter Abbeel -
2021 Oral: APS: Active Pretraining with Successor Features »
Hao Liu · Pieter Abbeel -
2021 Poster: PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training »
Kimin Lee · Laura Smith · Pieter Abbeel -
2021 Poster: Differentiable Spatial Planning using Transformers »
Devendra Singh Chaplot · Deepak Pathak · Jitendra Malik -
2021 Oral: PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training »
Kimin Lee · Laura Smith · Pieter Abbeel -
2021 Spotlight: Differentiable Spatial Planning using Transformers »
Devendra Singh Chaplot · Deepak Pathak · Jitendra Malik -
2021 Poster: Unsupervised Learning of Visual 3D Keypoints for Control »
Boyuan Chen · Pieter Abbeel · Deepak Pathak -
2021 Poster: State Entropy Maximization with Random Encoders for Efficient Exploration »
Younggyo Seo · Lili Chen · Jinwoo Shin · Honglak Lee · Pieter Abbeel · Kimin Lee -
2021 Poster: MSA Transformer »
Roshan Rao · Jason Liu · Robert Verkuil · Joshua Meier · John Canny · Pieter Abbeel · Tom Sercu · Alexander Rives -
2021 Poster: Model-Based Reinforcement Learning via Latent-Space Collocation »
Oleh Rybkin · Chuning Zhu · Anusha Nagabandi · Kostas Daniilidis · Igor Mordatch · Sergey Levine -
2021 Spotlight: MSA Transformer »
Roshan Rao · Jason Liu · Robert Verkuil · Joshua Meier · John Canny · Pieter Abbeel · Tom Sercu · Alexander Rives -
2021 Spotlight: Model-Based Reinforcement Learning via Latent-Space Collocation »
Oleh Rybkin · Chuning Zhu · Anusha Nagabandi · Kostas Daniilidis · Igor Mordatch · Sergey Levine -
2021 Spotlight: State Entropy Maximization with Random Encoders for Efficient Exploration »
Younggyo Seo · Lili Chen · Jinwoo Shin · Honglak Lee · Pieter Abbeel · Kimin Lee -
2021 Spotlight: Unsupervised Learning of Visual 3D Keypoints for Control »
Boyuan Chen · Pieter Abbeel · Deepak Pathak -
2021 : Part 2: Unsupervised Pre-Training in RL »
Pieter Abbeel -
2021 Tutorial: Unsupervised Learning for Reinforcement Learning »
Aravind Srinivas · Pieter Abbeel -
2020 Poster: CURL: Contrastive Unsupervised Representations for Reinforcement Learning »
Michael Laskin · Aravind Srinivas · Pieter Abbeel -
2020 Poster: One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control »
Wenlong Huang · Igor Mordatch · Deepak Pathak -
2020 Poster: Hallucinative Topological Memory for Zero-Shot Visual Planning »
Kara Liu · Thanard Kurutach · Christine Tung · Pieter Abbeel · Aviv Tamar -
2020 Poster: Responsive Safety in Reinforcement Learning by PID Lagrangian Methods »
Adam Stooke · Joshua Achiam · Pieter Abbeel -
2020 Poster: Variable Skipping for Autoregressive Range Density Estimation »
Eric Liang · Zongheng Yang · Ion Stoica · Pieter Abbeel · Yan Duan · Peter Chen -
2020 Poster: Hierarchically Decoupled Imitation For Morphological Transfer »
Donald Hejna · Lerrel Pinto · Pieter Abbeel -
2019 Workshop: Workshop on Self-Supervised Learning »
Aaron van den Oord · Yusuf Aytar · Carl Doersch · Carl Vondrick · Alec Radford · Pierre Sermanet · Amir Zamir · Pieter Abbeel -
2019 Poster: Cross-Domain 3D Equivariant Image Embeddings »
Carlos Esteves · Avneesh Sud · Zhengyi Luo · Kostas Daniilidis · Ameesh Makadia -
2019 Poster: Learning Latent Dynamics for Planning from Pixels »
Danijar Hafner · Timothy Lillicrap · Ian Fischer · Ruben Villegas · David Ha · Honglak Lee · James Davidson -
2019 Poster: Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables »
Friso Kingma · Pieter Abbeel · Jonathan Ho -
2019 Poster: On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference »
Rohin Shah · Noah Gundotra · Pieter Abbeel · Anca Dragan -
2019 Oral: Cross-Domain 3D Equivariant Image Embeddings »
Carlos Esteves · Avneesh Sud · Zhengyi Luo · Kostas Daniilidis · Ameesh Makadia -
2019 Oral: On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference »
Rohin Shah · Noah Gundotra · Pieter Abbeel · Anca Dragan -
2019 Oral: Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables »
Friso Kingma · Pieter Abbeel · Jonathan Ho -
2019 Oral: Learning Latent Dynamics for Planning from Pixels »
Danijar Hafner · Timothy Lillicrap · Ian Fischer · Ruben Villegas · David Ha · Honglak Lee · James Davidson -
2019 Poster: Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules »
Daniel Ho · Eric Liang · Peter Chen · Ion Stoica · Pieter Abbeel -
2019 Poster: Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design »
Jonathan Ho · Peter Chen · Aravind Srinivas · Rocky Duan · Pieter Abbeel -
2019 Poster: Self-Supervised Exploration via Disagreement »
Deepak Pathak · Dhiraj Gandhi · Abhinav Gupta -
2019 Poster: SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning »
Marvin Zhang · Sharad Vikram · Laura Smith · Pieter Abbeel · Matthew Johnson · Sergey Levine -
2019 Oral: Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design »
Jonathan Ho · Peter Chen · Aravind Srinivas · Rocky Duan · Pieter Abbeel -
2019 Oral: Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules »
Daniel Ho · Eric Liang · Peter Chen · Ion Stoica · Pieter Abbeel -
2019 Oral: Self-Supervised Exploration via Disagreement »
Deepak Pathak · Dhiraj Gandhi · Abhinav Gupta -
2019 Oral: SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning »
Marvin Zhang · Sharad Vikram · Laura Smith · Pieter Abbeel · Matthew Johnson · Sergey Levine -
2018 Poster: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor »
Tuomas Haarnoja · Aurick Zhou · Pieter Abbeel · Sergey Levine -
2018 Poster: Investigating Human Priors for Playing Video Games »
Rachit Dubey · Pulkit Agrawal · Deepak Pathak · Tom Griffiths · Alexei Efros -
2018 Poster: PixelSNAIL: An Improved Autoregressive Generative Model »
Xi Chen · Nikhil Mishra · Mostafa Rohaninejad · Pieter Abbeel -
2018 Oral: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor »
Tuomas Haarnoja · Aurick Zhou · Pieter Abbeel · Sergey Levine -
2018 Oral: PixelSNAIL: An Improved Autoregressive Generative Model »
Xi Chen · Nikhil Mishra · Mostafa Rohaninejad · Pieter Abbeel -
2018 Oral: Investigating Human Priors for Playing Video Games »
Rachit Dubey · Pulkit Agrawal · Deepak Pathak · Tom Griffiths · Alexei Efros -
2018 Poster: Automatic Goal Generation for Reinforcement Learning Agents »
Carlos Florensa · David Held · Xinyang Geng · Pieter Abbeel -
2018 Poster: Latent Space Policies for Hierarchical Reinforcement Learning »
Tuomas Haarnoja · Kristian Hartikainen · Pieter Abbeel · Sergey Levine -
2018 Poster: Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings »
John Co-Reyes · Yu Xuan Liu · Abhishek Gupta · Benjamin Eysenbach · Pieter Abbeel · Sergey Levine -
2018 Poster: Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control »
Aravind Srinivas · Allan Jabri · Pieter Abbeel · Sergey Levine · Chelsea Finn -
2018 Oral: Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control »
Aravind Srinivas · Allan Jabri · Pieter Abbeel · Sergey Levine · Chelsea Finn -
2018 Oral: Automatic Goal Generation for Reinforcement Learning Agents »
Carlos Florensa · David Held · Xinyang Geng · Pieter Abbeel -
2018 Oral: Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings »
John Co-Reyes · Yu Xuan Liu · Abhishek Gupta · Benjamin Eysenbach · Pieter Abbeel · Sergey Levine -
2018 Oral: Latent Space Policies for Hierarchical Reinforcement Learning »
Tuomas Haarnoja · Kristian Hartikainen · Pieter Abbeel · Sergey Levine -
2017 Poster: Curiosity-driven Exploration by Self-supervised Prediction »
Deepak Pathak · Pulkit Agrawal · Alexei Efros · Trevor Darrell -
2017 Poster: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks »
Chelsea Finn · Pieter Abbeel · Sergey Levine -
2017 Poster: Prediction and Control with Temporal Segment Models »
Nikhil Mishra · Pieter Abbeel · Igor Mordatch -
2017 Poster: Reinforcement Learning with Deep Energy-Based Policies »
Tuomas Haarnoja · Haoran Tang · Pieter Abbeel · Sergey Levine -
2017 Poster: Constrained Policy Optimization »
Joshua Achiam · David Held · Aviv Tamar · Pieter Abbeel -
2017 Talk: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks »
Chelsea Finn · Pieter Abbeel · Sergey Levine -
2017 Talk: Curiosity-driven Exploration by Self-supervised Prediction »
Deepak Pathak · Pulkit Agrawal · Alexei Efros · Trevor Darrell -
2017 Talk: Prediction and Control with Temporal Segment Models »
Nikhil Mishra · Pieter Abbeel · Igor Mordatch -
2017 Talk: Reinforcement Learning with Deep Energy-Based Policies »
Tuomas Haarnoja · Haoran Tang · Pieter Abbeel · Sergey Levine -
2017 Talk: Constrained Policy Optimization »
Joshua Achiam · David Held · Aviv Tamar · Pieter Abbeel