Timezone: »

Hierarchical Imitation and Reinforcement Learning
Hoang Le · Nan Jiang · Alekh Agarwal · Miro Dudik · Yisong Yue · Hal Daume

Fri Jul 13 09:15 AM -- 12:00 PM (PDT) @ Hall B #15

We study how to effectively leverage expert feedback to learn sequential decision-making policies. We focus on problems with sparse rewards and long time horizons, which typically pose significant challenges in reinforcement learning. We propose an algorithmic framework, called hierarchical guidance, that leverages the hierarchical structure of the underlying problem to integrate different modes of expert interaction. Our framework can incorporate different combinations of imitation learning (IL) and reinforcement learning (RL) at different levels, leading to dramatic reductions in both expert effort and cost of exploration. Using long-horizon benchmarks, including Montezuma's Revenge, we demonstrate that our approach can learn significantly faster than hierarchical RL, and be significantly more label-efficient than standard IL. We also theoretically analyze labeling cost for certain instantiations of our framework.

Author Information

Hoang Le (Caltech)

Hoang M. Le is a PhD Candidate in the Computing and Mathematical Sciences Department at the California Institute of Technology. He received a M.S. in Cognitive Systems and Interactive Media from the Universitat Pompeu Fabra, Barcelona, Spain, and a B.A. in Mathematics from Bucknell University in Lewisburg, PA. He is a recipient of an Amazon AI Fellowship. Hoang’s research focuses on the theory and applications of sequential decision making, with a strong focus on imitation learning. He has broad familiarity with the latest advances in imitation learning techniques and applications. His own research in imitation learning blends principled new techniques with a diverse range of application domains. In addition to popular reinforcement learning domains such as maze navigation and Atari games, his prior work on imitation learning has been applied to learning human behavior in team sports and developing automatic camera broadcasting system.

Nan Jiang (Microsoft Research)
Alekh Agarwal (Microsoft Research)
Miro Dudik (Microsoft Research)
Miro Dudik

Miroslav Dudík is a Senior Principal Researcher in machine learning at Microsoft Research, NYC. His research focuses on combining theoretical and applied aspects of machine learning, statistics, convex optimization, and algorithms. Most recently he has worked on contextual bandits, reinforcement learning, and algorithmic fairness. He received his PhD from Princeton in 2007. He is a co-creator of the Fairlearn toolkit for assessing and improving the fairness of machine learning models and of the Maxentpackage for modeling species distributions, which is used by biologists around the world to design national parks, model the impacts of climate change, and discover new species.

Yisong Yue (Caltech)

Yisong Yue is an assistant professor in the Computing and Mathematical Sciences Department at the California Institute of Technology. He was previously a research scientist at Disney Research. Before that, he was a postdoctoral researcher in the Machine Learning Department and the iLab at Carnegie Mellon University. He received a Ph.D. from Cornell University and a B.S. from the University of Illinois at Urbana-Champaign. Yisong's research interests lie primarily in the theory and application of statistical machine learning. He is particularly interested in developing novel methods for interactive machine learning and structured prediction. In the past, his research has been applied to information retrieval, recommender systems, text classification, learning from rich user interfaces, analyzing implicit human feedback, data-driven animation, behavior analysis, sports analytics, policy learning in robotics, and adaptive planning & allocation problems.

Hal Daume (Microsoft Research)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors