Timezone: »
We study how to effectively leverage expert feedback to learn sequential decision-making policies. We focus on problems with sparse rewards and long time horizons, which typically pose significant challenges in reinforcement learning. We propose an algorithmic framework, called hierarchical guidance, that leverages the hierarchical structure of the underlying problem to integrate different modes of expert interaction. Our framework can incorporate different combinations of imitation learning (IL) and reinforcement learning (RL) at different levels, leading to dramatic reductions in both expert effort and cost of exploration. Using long-horizon benchmarks, including Montezuma's Revenge, we demonstrate that our approach can learn significantly faster than hierarchical RL, and be significantly more label-efficient than standard IL. We also theoretically analyze labeling cost for certain instantiations of our framework.
Author Information
Hoang Le (Caltech)
Hoang M. Le is a PhD Candidate in the Computing and Mathematical Sciences Department at the California Institute of Technology. He received a M.S. in Cognitive Systems and Interactive Media from the Universitat Pompeu Fabra, Barcelona, Spain, and a B.A. in Mathematics from Bucknell University in Lewisburg, PA. He is a recipient of an Amazon AI Fellowship. Hoang’s research focuses on the theory and applications of sequential decision making, with a strong focus on imitation learning. He has broad familiarity with the latest advances in imitation learning techniques and applications. His own research in imitation learning blends principled new techniques with a diverse range of application domains. In addition to popular reinforcement learning domains such as maze navigation and Atari games, his prior work on imitation learning has been applied to learning human behavior in team sports and developing automatic camera broadcasting system.
Nan Jiang (Microsoft Research)
Alekh Agarwal (Microsoft Research)
Miroslav Dudik (Microsoft Research)
Yisong Yue (Caltech)
Yisong Yue is an assistant professor in the Computing and Mathematical Sciences Department at the California Institute of Technology. He was previously a research scientist at Disney Research. Before that, he was a postdoctoral researcher in the Machine Learning Department and the iLab at Carnegie Mellon University. He received a Ph.D. from Cornell University and a B.S. from the University of Illinois at Urbana-Champaign. Yisong's research interests lie primarily in the theory and application of statistical machine learning. He is particularly interested in developing novel methods for interactive machine learning and structured prediction. In the past, his research has been applied to information retrieval, recommender systems, text classification, learning from rich user interfaces, analyzing implicit human feedback, data-driven animation, behavior analysis, sports analytics, policy learning in robotics, and adaptive planning & allocation problems.
Hal Daume (Microsoft Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: Hierarchical Imitation and Reinforcement Learning »
Fri Jul 13th 09:00 -- 09:20 AM Room A1
More from the Same Authors
-
2020 Workshop: Real World Experiment Design and Active Learning »
Ilija Bogunovic · Willie Neiswanger · Yisong Yue -
2020 Poster: Doubly robust off-policy evaluation with shrinkage »
Yi Su · Maria Dimakopoulou · Akshay Krishnamurthy · Miroslav Dudik -
2020 Poster: Learning Calibratable Policies using Programmatic Style-Consistency »
Eric Zhan · Albert Tseng · Yisong Yue · Adith Swaminathan · Matthew Hausknecht -
2020 Poster: Multiresolution Tensor Learning for Efficient and Interpretable Spatial Analysis »
Jung Yeon Park · Kenneth Carr · Stephan Zheng · Yisong Yue · Rose Yu -
2019 Workshop: Real-world Sequential Decision Making: Reinforcement Learning and Beyond »
Hoang Le · Yisong Yue · Adith Swaminathan · Byron Boots · Ching-An Cheng -
2019 Poster: Batch Policy Learning under Constraints »
Hoang Le · Cameron Voloshin · Yisong Yue -
2019 Poster: Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback »
Chicheng Zhang · Alekh Agarwal · Hal Daume · John Langford · Sahand Negahban -
2019 Poster: Non-Monotonic Sequential Text Generation »
Sean Welleck · Kiante Brantley · Hal Daume · Kyunghyun Cho -
2019 Poster: Fair Regression: Quantitative Definitions and Reduction-Based Algorithms »
Alekh Agarwal · Miroslav Dudik · Steven Wu -
2019 Oral: Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback »
Chicheng Zhang · Alekh Agarwal · Hal Daume · John Langford · Sahand Negahban -
2019 Oral: Non-Monotonic Sequential Text Generation »
Sean Welleck · Kiante Brantley · Hal Daume · Kyunghyun Cho -
2019 Oral: Fair Regression: Quantitative Definitions and Reduction-Based Algorithms »
Alekh Agarwal · Miroslav Dudik · Steven Wu -
2019 Oral: Batch Policy Learning under Constraints »
Hoang Le · Cameron Voloshin · Yisong Yue -
2019 Poster: Control Regularization for Reduced Variance Reinforcement Learning »
Richard Cheng · Abhinav Verma · Gabor Orosz · Swarat Chaudhuri · Yisong Yue · Joel Burdick -
2019 Oral: Control Regularization for Reduced Variance Reinforcement Learning »
Richard Cheng · Abhinav Verma · Gabor Orosz · Swarat Chaudhuri · Yisong Yue · Joel Burdick -
2019 Poster: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Poster: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daume · John Langford · Paul Mineiro -
2019 Oral: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Oral: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daume · John Langford · Paul Mineiro -
2018 Poster: Iterative Amortized Inference »
Joe Marino · Yisong Yue · Stephan Mandt -
2018 Poster: A Reductions Approach to Fair Classification »
Alekh Agarwal · Alina Beygelzimer · Miroslav Dudik · John Langford · Hanna Wallach -
2018 Oral: Iterative Amortized Inference »
Joe Marino · Yisong Yue · Stephan Mandt -
2018 Oral: A Reductions Approach to Fair Classification »
Alekh Agarwal · Alina Beygelzimer · Miroslav Dudik · John Langford · Hanna Wallach -
2018 Poster: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Poster: Stagewise Safe Bayesian Optimization with Gaussian Processes »
Yanan Sui · Vincent Zhuang · Joel Burdick · Yisong Yue -
2018 Oral: Stagewise Safe Bayesian Optimization with Gaussian Processes »
Yanan Sui · Vincent Zhuang · Joel Burdick · Yisong Yue -
2018 Tutorial: Imitation Learning »
Yisong Yue · Hoang Le -
2017 Poster: Contextual Decision Processes with low Bellman rank are PAC-Learnable »
Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Poster: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Poster: Coordinated Multi-Agent Imitation Learning »
Hoang Le · Yisong Yue · Peter Carr · Patrick Lucey -
2017 Talk: Coordinated Multi-Agent Imitation Learning »
Hoang Le · Yisong Yue · Peter Carr · Patrick Lucey -
2017 Talk: Contextual Decision Processes with low Bellman rank are PAC-Learnable »
Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Talk: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Poster: Active Learning for Cost-Sensitive Classification »
Akshay Krishnamurthy · Alekh Agarwal · Tzu-Kuo Huang · Hal Daumé III · John Langford -
2017 Talk: Active Learning for Cost-Sensitive Classification »
Akshay Krishnamurthy · Alekh Agarwal · Tzu-Kuo Huang · Hal Daumé III · John Langford -
2017 Tutorial: Real World Interactive Learning »
Alekh Agarwal · John Langford