Demonstration-Conditioned Reinforcement Learning for Few-Shot Imitation

Christopher Dance · Perez Julien · Théo Cachet


Keywords: [ Planning and Control ] [ Reinforcement Learning and Planning ]

[ Abstract ]
[ Slides
[ Paper ]
[ Visit Poster at Spot D0 in Virtual World ]
Tue 20 Jul 9 a.m. PDT — 11 a.m. PDT
Spotlight presentation: Deep Reinforcement Learning 2
Tue 20 Jul 5 a.m. PDT — 6 a.m. PDT


In few-shot imitation, an agent is given a few demonstrations of a previously unseen task, and must then successfully perform that task. We propose a novel approach to learning few-shot-imitation agents that we call demonstration-conditioned reinforcement learning (DCRL). Given a training set consisting of demonstrations, reward functions and transition distributions for multiple tasks, the idea is to work with a policy that takes demonstrations as input, and to train this policy to maximize the average of the cumulative reward over the set of training tasks. Relative to previously proposed few-shot imitation methods that use behaviour cloning or infer reward functions from demonstrations, our method has the disadvantage that it requires reward functions at training time. However, DCRL also has several advantages, such as the ability to improve upon suboptimal demonstrations, to operate given state-only demonstrations, and to cope with a domain shift between the demonstrator and the agent. Moreover, we show that DCRL outperforms methods based on behaviour cloning by a large margin, on navigation tasks and on robotic manipulation tasks from the Meta-World benchmark.

Chat is not available.