Skip to yearly menu bar Skip to main content


Oral

Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables

Kate Rakelly · Aurick Zhou · Chelsea Finn · Sergey Levine · Deirdre Quillen

[ ] [ Visit Deep RL ]
[ Slides [ Video

Abstract:

Deep reinforcement learning algorithms require large amounts of experience to learn an individual task. While in principle meta-reinforcement learning (meta-RL) algorithms enable agents to learn new skills from small amounts of experience, several major challenges preclude their practicality. Current methods rely heavily on on-policy experience, limiting their sample efficiency, and lack mechanisms to reason about task uncertainty when identifying and learning new tasks, limiting their effectiveness in sparse reward problems. In this paper, we aim to address these challenges by developing an off-policy meta-RL algorithm based on online latent task inference. Our method can be interpreted as an implementation of online probabilistic filtering of latent task variables to infer how to solve a new task from small amounts of experience. This probabilistic interpretation also enables posterior sampling for structured exploration. Our method outperforms prior algorithms in asymptotic performance and sample efficiency on several meta-RL benchmarks.

Chat is not available.