Timezone: »

SparseDice: Imitation Learning for Temporally Sparse Data via Regularization
Alberto Camacho · Izzeddin Gur · Marcin Moczulski · Ofir Nachum · Aleksandra Faust

Imitation learning learns how to act by observing the behavior of an expert demonstrator. We are concerned with a setting where the demonstrations comprise only a subset of state-action pairs (as opposed to the whole trajectories). Our setup reflects the limitations of real-world problems when accessing the expert data. For example, user logs may contain incomplete traces of behavior, or in robotics non-technical human demonstrators may describe trajectories using only a subset of all state-action pairs. A recent approach to imitation learning via distribution matching, ValueDice, tends to overfit when demonstrations are temporally sparse. We counter the overfitting by contributing regularization losses. Our empirical evaluation with Mujoco benchmarks shows that we can successfully learn from very sparse and scarce expert data. Moreover, (i) the quality of the learned policies is often comparable to those learned with full expert trajectories, and (ii) the number of training steps required to learn from sparse data is similar to the number of training steps when the agent has access to full expert trajectories.

Author Information

Alberto Camacho (University of Toronto)
Izzeddin Gur (Google)
Marcin Moczulski (Google Brain)
Ofir Nachum (Google Brain)
Aleksandra Faust (Google Brain)

Aleksandra Faust is a Staff Research Scientist at Google Brain Robotics, leading Task and Motion planning research group. Previously, Aleksandra led machine learning efforts for self-driving car planning and controls in Waymo, and was a researcher at Sandia National Laboratories. She earned a Ph.D. in Computer Science at the University of New Mexico, a Master's in Computer Science from the University of Illinois at Urbana-Champaign, and a Bachelors in Math with a minor in Computer Science from the University of Belgrade. Her research interests include machine learning for safe, scalable, and socially-aware motion planning, decision-making, and robot behavior. Aleksandra won the Tom L. Popejoy Award for the best doctoral dissertation at the University of New Mexico in STEM in the period of 2011-2014, and was named Distinguished Alumna by the University of New Mexico School of Engineering. Her work has been featured in the New York Times, PC Magazine, ZdNet, and ​was awarded Best Paper in Service Robotics at ICRA 2018.

More from the Same Authors