Timezone: »
Identifying algorithms that flexibly and efficiently discover temporally-extended multi-phase plans is an essential step for the advancement of robotics and model-based reinforcement learning. The core problem of long-range planning is finding an efficient way to search through the tree of possible action sequences. Existing non-learned planning solutions from the Task and Motion Planning (TAMP) literature rely on the existence of logical descriptions for the effects and preconditions for actions. This constraint allows TAMP methods to efficiently reduce the tree search problem but limits their ability to generalize to unseen and complex physical environments. In contrast, deep reinforcement learning (DRL) methods use flexible neural-network-based function approximators to discover policies that generalize naturally to unseen circumstances. However, DRL methods struggle to handle the very sparse reward landscapes inherent to long-range multi-step planning situations. Here, we propose the Curious Sample Planner (CSP), which fuses elements of TAMP and DRL by combining a curiosity-guided sampling strategy with imitation learning to accelerate planning. We show that CSP can efficiently discover interesting and complex temporally-extended plans for solving a wide range of physically realistic 3D tasks. In contrast, standard planning and learning methods often fail to solve these tasks at all or do so only with a huge and highly variable number of training samples. We explore the use of a variety of curiosity metrics with CSP and analyze the types of solutions that CSP discovers. Finally, we show that CSP supports task transfer so that the exploration policies learned during experience with one task can help improve efficiency on related tasks.
Author Information
Aidan Curtis (Rice University)
Minjian Xin (Shanghai Jiao Tong University)
Dilip Arumugam (Stanford University)
Kevin Feigelis (Stanford University)
Daniel Yamins (Stanford University)
More from the Same Authors
-
2021 : Bad-Policy Density: A Measure of Reinforcement-Learning Hardness »
David Abel · Cameron Allen · Dilip Arumugam · D Ellis Hershkowitz · Michael L. Littman · Lawson Wong -
2022 : Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning »
Dilip Arumugam · Benjamin Van Roy -
2021 : Bad-Policy Density: A Measure of Reinforcement-Learning Hardness »
David Abel · Cameron Allen · Dilip Arumugam · D Ellis Hershkowitz · Michael L. Littman · Lawson Wong -
2021 Poster: Deciding What to Learn: A Rate-Distortion Approach »
Dilip Arumugam · Benjamin Van Roy -
2021 Spotlight: Deciding What to Learn: A Rate-Distortion Approach »
Dilip Arumugam · Benjamin Van Roy -
2020 Poster: Active World Model Learning in Agent-rich Environments with Progress Curiosity »
Kuno Kim · Megumi Sano · Julian De Freitas · Nick Haber · Daniel Yamins -
2020 Poster: Visual Grounding of Learned Physical Models »
Yunzhu Li · Toru Lin · Kexin Yi · Daniel Bear · Daniel Yamins · Jiajun Wu · Josh Tenenbaum · Antonio Torralba -
2020 Poster: Two Routes to Scalable Credit Assignment without Weight Symmetry »
Daniel Kunin · Aran Nayebi · Javier Sagastuy-Brena · Surya Ganguli · Jonathan Bloom · Daniel Yamins