Timezone: »
We study how to effectively leverage expert feedback to learn sequential decision-making policies. We focus on problems with sparse rewards and long time horizons, which typically pose significant challenges in reinforcement learning. We propose an algorithmic framework, called hierarchical guidance, that leverages the hierarchical structure of the underlying problem to integrate different modes of expert interaction. Our framework can incorporate different combinations of imitation learning (IL) and reinforcement learning (RL) at different levels, leading to dramatic reductions in both expert effort and cost of exploration. Using long-horizon benchmarks, including Montezuma's Revenge, we demonstrate that our approach can learn significantly faster than hierarchical RL, and be significantly more label-efficient than standard IL. We also theoretically analyze labeling cost for certain instantiations of our framework.
Author Information
Hoang Le (Caltech)
Hoang M. Le is a PhD Candidate in the Computing and Mathematical Sciences Department at the California Institute of Technology. He received a M.S. in Cognitive Systems and Interactive Media from the Universitat Pompeu Fabra, Barcelona, Spain, and a B.A. in Mathematics from Bucknell University in Lewisburg, PA. He is a recipient of an Amazon AI Fellowship. Hoang’s research focuses on the theory and applications of sequential decision making, with a strong focus on imitation learning. He has broad familiarity with the latest advances in imitation learning techniques and applications. His own research in imitation learning blends principled new techniques with a diverse range of application domains. In addition to popular reinforcement learning domains such as maze navigation and Atari games, his prior work on imitation learning has been applied to learning human behavior in team sports and developing automatic camera broadcasting system.
Nan Jiang (Microsoft Research)
Alekh Agarwal (Microsoft Research)
Miroslav Dudik (Microsoft Research)

Miroslav Dudík is a Senior Principal Researcher in machine learning at Microsoft Research, NYC. His research focuses on combining theoretical and applied aspects of machine learning, statistics, convex optimization, and algorithms. Most recently he has worked on contextual bandits, reinforcement learning, and algorithmic fairness. He received his PhD from Princeton in 2007. He is a co-creator of the Fairlearn toolkit for assessing and improving the fairness of machine learning models and of the Maxent package for modeling species distributions, which is used by biologists around the world to design national parks, model the impacts of climate change, and discover new species.
Yisong Yue (Caltech)

Yisong Yue is a Professor of Computing and Mathematical Sciences at Caltech and (via sabbatical) a Principal Scientist at Latitude AI. His research interests span both fundamental and applied pursuits, from novel learning-theoretic frameworks all the way to deep learning deployed in autonomous driving on public roads. His work has been recognized with multiple paper awards and nominations, including in robotics, computer vision, sports analytics, machine learning for health, and information retrieval. At Latitude AI, he is working on machine learning approaches to motion planning for autonomous driving.
Hal Daumé III (Microsoft Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: Hierarchical Imitation and Reinforcement Learning »
Fri. Jul 13th 09:00 -- 09:20 AM Room A1
More from the Same Authors
-
2021 : Provably efficient exploration-free transfer RL for near-deterministic latent dynamics »
Yao Liu · Dipendra Misra · Miroslav Dudik · Robert Schapire -
2023 : Preferential Multi-Attribute Bayesian Optimization with Application to Exoskeleton Personalization »
Raul Astudillo · Amy Li · Maegan Tucker · Chu Xin Cheng · Aaron Ames · Yisong Yue -
2023 : Dueling Bandits for Online Preference Learning »
Yisong Yue -
2023 Poster: Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation »
Fengxue Zhang · Jialin Song · James Bowden · Alexander Ladd · Yisong Yue · Thomas Desautels · Yuxin Chen -
2023 Poster: MABe22: A Multi-Species Multi-Task Benchmark for Learned Representations of Behavior »
Jennifer J. Sun · Markus Marks · Andrew Ulmer · Dipam Chakraborty · Brian Geuther · Edward Hayes · Heng Jia · Vivek Kumar · Sebastian Oleszko · Zachary Partridge · Milan Peelman · Alice Robie · Catherine Schretter · Keith Sheppard · Chao Sun · Param Uttarwar · Julian Wagner · Erik Werner · Joseph Parker · Pietro Perona · Yisong Yue · Kristin Branson · Ann Kennedy -
2023 Poster: Eventual Discounting Temporal Logic Counterfactual Experience Replay »
Cameron Voloshin · Abhinav Verma · Yisong Yue -
2022 Workshop: Adaptive Experimental Design and Active Learning in the Real World »
Mojmir Mutny · Willie Neiswanger · Ilija Bogunovic · Stefano Ermon · Yisong Yue · Andreas Krause -
2022 Poster: Investigating Generalization by Controlling Normalized Margin »
Alexander Farhang · Jeremy Bernstein · Kushal Tirumala · Yang Liu · Yisong Yue -
2022 Spotlight: Investigating Generalization by Controlling Normalized Margin »
Alexander Farhang · Jeremy Bernstein · Kushal Tirumala · Yang Liu · Yisong Yue -
2022 Poster: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Spotlight: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Poster: LyaNet: A Lyapunov Framework for Training Neural ODEs »
Ivan Dario Jimenez Rodriguez · Aaron Ames · Yisong Yue -
2022 Spotlight: LyaNet: A Lyapunov Framework for Training Neural ODEs »
Ivan Dario Jimenez Rodriguez · Aaron Ames · Yisong Yue -
2021 : RL + Recommender Systems Panel »
Alekh Agarwal · Ed Chi · Maria Dimakopoulou · Georgios Theocharous · Minmin Chen · Lihong Li -
2021 : Personalized Preference Learning - from Spinal Cord Stimulation to Exoskeletons »
Yisong Yue -
2021 Poster: Learning by Turning: Neural Architecture Aware Optimisation »
Yang Liu · Jeremy Bernstein · Markus Meister · Yisong Yue -
2021 Spotlight: Learning by Turning: Neural Architecture Aware Optimisation »
Yang Liu · Jeremy Bernstein · Markus Meister · Yisong Yue -
2021 Poster: Interactive Learning from Activity Description »
Khanh Nguyen · Dipendra Misra · Robert Schapire · Miroslav Dudik · Patrick Shafto -
2021 Spotlight: Interactive Learning from Activity Description »
Khanh Nguyen · Dipendra Misra · Robert Schapire · Miroslav Dudik · Patrick Shafto -
2021 : Conclusions »
Kate Crawford · Hal Daumé III -
2021 : Political and Legal Implications »
Hal Daumé III · Kate Crawford -
2021 : Environmental Implications »
Kate Crawford · Hal Daumé III -
2021 : Social Aspects »
Kate Crawford · Hal Daumé III -
2021 : Economic Implications »
Hal Daumé III · Kate Crawford -
2021 Tutorial: Social Implications of Large Language Models »
Hal Daumé III · Kate Crawford -
2021 : Introduction »
Kate Crawford · Hal Daumé III -
2020 Workshop: Real World Experiment Design and Active Learning »
Ilija Bogunovic · Willie Neiswanger · Yisong Yue -
2020 Poster: Doubly robust off-policy evaluation with shrinkage »
Yi Su · Maria Dimakopoulou · Akshay Krishnamurthy · Miroslav Dudik -
2020 Poster: Learning Calibratable Policies using Programmatic Style-Consistency »
Eric Zhan · Albert Tseng · Yisong Yue · Adith Swaminathan · Matthew Hausknecht -
2020 Poster: Multiresolution Tensor Learning for Efficient and Interpretable Spatial Analysis »
Jung Yeon Park · Kenneth Carr · Stephan Zheng · Yisong Yue · Rose Yu -
2019 : Miro Dudík (Microsoft Research) - Doubly Robust Off-policy Evaluation with Shrinkage »
Miroslav Dudik -
2019 Workshop: Real-world Sequential Decision Making: Reinforcement Learning and Beyond »
Hoang Le · Yisong Yue · Adith Swaminathan · Byron Boots · Ching-An Cheng -
2019 Poster: Batch Policy Learning under Constraints »
Hoang Le · Cameron Voloshin · Yisong Yue -
2019 Poster: Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback »
Chicheng Zhang · Alekh Agarwal · Hal Daumé III · John Langford · Sahand Negahban -
2019 Poster: Non-Monotonic Sequential Text Generation »
Sean Welleck · Kiante Brantley · Hal Daumé III · Kyunghyun Cho -
2019 Poster: Fair Regression: Quantitative Definitions and Reduction-Based Algorithms »
Alekh Agarwal · Miroslav Dudik · Steven Wu -
2019 Oral: Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback »
Chicheng Zhang · Alekh Agarwal · Hal Daumé III · John Langford · Sahand Negahban -
2019 Oral: Non-Monotonic Sequential Text Generation »
Sean Welleck · Kiante Brantley · Hal Daumé III · Kyunghyun Cho -
2019 Oral: Fair Regression: Quantitative Definitions and Reduction-Based Algorithms »
Alekh Agarwal · Miroslav Dudik · Steven Wu -
2019 Oral: Batch Policy Learning under Constraints »
Hoang Le · Cameron Voloshin · Yisong Yue -
2019 Poster: Control Regularization for Reduced Variance Reinforcement Learning »
Richard Cheng · Abhinav Verma · Gabor Orosz · Swarat Chaudhuri · Yisong Yue · Joel Burdick -
2019 Oral: Control Regularization for Reduced Variance Reinforcement Learning »
Richard Cheng · Abhinav Verma · Gabor Orosz · Swarat Chaudhuri · Yisong Yue · Joel Burdick -
2019 Poster: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Poster: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro -
2019 Oral: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Oral: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro -
2018 Poster: Iterative Amortized Inference »
Joe Marino · Yisong Yue · Stephan Mandt -
2018 Poster: A Reductions Approach to Fair Classification »
Alekh Agarwal · Alina Beygelzimer · Miroslav Dudik · John Langford · Hanna Wallach -
2018 Oral: Iterative Amortized Inference »
Joe Marino · Yisong Yue · Stephan Mandt -
2018 Oral: A Reductions Approach to Fair Classification »
Alekh Agarwal · Alina Beygelzimer · Miroslav Dudik · John Langford · Hanna Wallach -
2018 Poster: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Poster: Stagewise Safe Bayesian Optimization with Gaussian Processes »
Yanan Sui · Vincent Zhuang · Joel Burdick · Yisong Yue -
2018 Oral: Stagewise Safe Bayesian Optimization with Gaussian Processes »
Yanan Sui · Vincent Zhuang · Joel Burdick · Yisong Yue -
2018 Tutorial: Imitation Learning »
Yisong Yue · Hoang Le -
2017 : Corralling a Band of Bandit Algorithms »
Alekh Agarwal -
2017 Poster: Contextual Decision Processes with low Bellman rank are PAC-Learnable »
Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Poster: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Poster: Coordinated Multi-Agent Imitation Learning »
Hoang Le · Yisong Yue · Peter Carr · Patrick Lucey -
2017 Talk: Coordinated Multi-Agent Imitation Learning »
Hoang Le · Yisong Yue · Peter Carr · Patrick Lucey -
2017 Talk: Contextual Decision Processes with low Bellman rank are PAC-Learnable »
Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Talk: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Poster: Active Learning for Cost-Sensitive Classification »
Akshay Krishnamurthy · Alekh Agarwal · Tzu-Kuo Huang · Hal Daumé III · John Langford -
2017 Talk: Active Learning for Cost-Sensitive Classification »
Akshay Krishnamurthy · Alekh Agarwal · Tzu-Kuo Huang · Hal Daumé III · John Langford -
2017 Tutorial: Real World Interactive Learning »
Alekh Agarwal · John Langford