Timezone: »

CLUTR: Curriculum Learning via Unsupervised Task Representation Learning
Abdus Salam Azad · Izzeddin Gur · Jasper Emhoff · Nathaniel Alexis · Aleksandra Faust · Pieter Abbeel · Ion Stoica

Tue Jul 25 05:00 PM -- 06:30 PM (PDT) @ Exhibit Hall 1 #643

Reinforcement Learning (RL) algorithms are often known for sample inefficiency and difficult generalization. Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks. This is a non-stationary process where the task distribution evolves along with agent policies; creating an instability over time. While past works demonstrated the potential of such approaches, sampling effectively from the task space remains an open challenge, bottlenecking these approaches. To this end, we introduce CLUTR: a novel unsupervised curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization. It first trains a recurrent variational autoencoder on randomly generated tasks to learn a latent task manifold. Next, a teacher agent creates a curriculum by maximizing a minimax REGRET-based objective on a set of latent tasks sampled from this manifold. Using the fixed-pretrained task manifold, we show that CLUTR successfully overcomes the non-stationarity problem and improves stability. Our experimental results show CLUTR outperforms PAIRED, a principled and popular UED method, in the challenging CarRacing and navigation environments: achieving 10.6X and 45% improvement in zero-shot generalization, respectively. CLUTR also performs comparably to the non-UED state-of-the-art for CarRacing, while requiring 500X fewer environment interactions. We open source our code at https://github.com/clutr/clutr.

Author Information

Abdus Salam Azad (University of California, Berkeley)
Izzeddin Gur (Google)
Jasper Emhoff (University of California, Berkeley)
Nathaniel Alexis (University of California, Berkeley)
Aleksandra Faust (Google Brain)

Aleksandra Faust is a Staff Research Scientist at Google Brain Robotics, leading Task and Motion planning research group. Previously, Aleksandra led machine learning efforts for self-driving car planning and controls in Waymo, and was a researcher at Sandia National Laboratories. She earned a Ph.D. in Computer Science at the University of New Mexico, a Master's in Computer Science from the University of Illinois at Urbana-Champaign, and a Bachelors in Math with a minor in Computer Science from the University of Belgrade. Her research interests include machine learning for safe, scalable, and socially-aware motion planning, decision-making, and robot behavior. Aleksandra won the Tom L. Popejoy Award for the best doctoral dissertation at the University of New Mexico in STEM in the period of 2011-2014, and was named Distinguished Alumna by the University of New Mexico School of Engineering. Her work has been featured in the New York Times, PC Magazine, ZdNet, and ​was awarded Best Paper in Service Robotics at ICRA 2018.

Pieter Abbeel (UC Berkeley & Covariant)
Ion Stoica (University of California, Berkeley)

More from the Same Authors