Timezone: »
Poster
TGRL: An Algorithm for Teacher Guided Reinforcement Learning
Idan Shenfeld · Zhang-Wei Hong · Aviv Tamar · Pulkit Agrawal
We consider solving sequential decision-making problems in the scenario where the agent has access to two supervision sources: $\textit{reward signal}$ and a $\textit{teacher}$ that can be queried to obtain a $\textit{good}$ action for any state encountered by the agent. Learning solely from rewards, or reinforcement learning, is data inefficient and may not learn high-reward policies in challenging scenarios involving sparse rewards or partial observability. On the other hand, learning from a teacher may sometimes be infeasible. For instance, the actions provided by a teacher with privileged information may be unlearnable by an agent with limited information (i.e., partial observability). In other scenarios, the teacher might be sub-optimal, and imitating their actions can limit the agent's performance. To overcome these challenges, prior work proposed to jointly optimize imitation and reinforcement learning objectives but relied on heuristics and problem-specific hyper-parameter tuning to balance the two objectives. We introduce Teacher Guided Reinforcement Learning (TGRL), a principled approach to dynamically balance following the teacher's guidance and leveraging RL. TGRL outperforms strong baselines across diverse domains without hyperparameter tuning.
Author Information
Idan Shenfeld (MIT)
Zhang-Wei Hong (MIT)
Aviv Tamar (Technion)
Pulkit Agrawal (MIT)
More from the Same Authors
-
2021 : Topological Experience Replay for Fast Q-Learning »
Zhang-Wei Hong · Tao Chen · Yen-Chen Lin · Joni Pajarinen · Pulkit Agrawal -
2021 : Topological Experience Replay for Fast Q-Learning »
Zhang-Wei Hong · Tao Chen · Yen-Chen Lin · Joni Pajarinen · Pulkit Agrawal -
2021 : Understanding the Generalization Gap in Visual Reinforcement Learning »
Anurag Ajay · Ge Yang · Ofir Nachum · Pulkit Agrawal -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2023 : Visual Dexterity: In-hand Dexterous Manipulation from Depth »
Tao Chen · Megha Tippur · Siyang Wu · Vikash Kumar · Edward Adelson · Pulkit Agrawal -
2023 : Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-loop feedback »
Marcel Torne Villasevil · Max Balsells I Pamies · Zihan Wang · Samedh Desai · Tao Chen · Pulkit Agrawal · Abhishek Gupta -
2023 Poster: Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation »
Zechu Li · Tao Chen · Zhang-Wei Hong · Anurag Ajay · Pulkit Agrawal -
2023 Poster: Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation »
Andi Peng · Aviv Netanyahu · Mark Ho · Tianmin Shu · Andreea Bobu · Julie Shah · Pulkit Agrawal -
2023 Poster: Statistical Learning under Heterogenous Distribution Shift »
Max Simchowitz · Anurag Ajay · Pulkit Agrawal · Akshay Krishnamurthy -
2023 Poster: Learning Control by Iterative Inversion »
Gal Leibovich · Guy Jacob · Or Avner · Gal Novik · Aviv Tamar -
2023 Poster: Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks »
Minyoung Huh · Brian Cheung · Pulkit Agrawal · Phillip Isola -
2023 Poster: ContraBAR: Contrastive Bayes-Adaptive Deep RL »
Era Choshen · Aviv Tamar -
2022 Poster: Unsupervised Image Representation Learning with Deep Latent Particles »
Tal Daniel · Aviv Tamar -
2022 Poster: Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning »
Aviv Netanyahu · Tianmin Shu · Josh Tenenbaum · Pulkit Agrawal -
2022 Spotlight: Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning »
Aviv Netanyahu · Tianmin Shu · Josh Tenenbaum · Pulkit Agrawal -
2022 Spotlight: Unsupervised Image Representation Learning with Deep Latent Particles »
Tal Daniel · Aviv Tamar -
2022 Poster: Offline RL Policies Should Be Trained to be Adaptive »
Dibya Ghosh · Anurag Ajay · Pulkit Agrawal · Sergey Levine -
2022 Oral: Offline RL Policies Should Be Trained to be Adaptive »
Dibya Ghosh · Anurag Ajay · Pulkit Agrawal · Sergey Levine -
2021 Workshop: Self-Supervised Learning for Reasoning and Perception »
Pengtao Xie · Shanghang Zhang · Ishan Misra · Pulkit Agrawal · Katerina Fragkiadaki · Ruisi Zhang · Tassilo Klein · Asli Celikyilmaz · Mihaela van der Schaar · Eric Xing -
2021 Poster: Learning Task Informed Abstractions »
Xiang Fu · Ge Yang · Pulkit Agrawal · Tommi Jaakkola -
2021 Spotlight: Learning Task Informed Abstractions »
Xiang Fu · Ge Yang · Pulkit Agrawal · Tommi Jaakkola -
2020 Poster: Hallucinative Topological Memory for Zero-Shot Visual Planning »
Kara Liu · Thanard Kurutach · Christine Tung · Pieter Abbeel · Aviv Tamar -
2020 Poster: Sub-Goal Trees -- a Framework for Goal-Based Reinforcement Learning »
Tom Jurgenson · Or Avner · Edward Groshev · Aviv Tamar -
2019 Poster: Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN »
dror freirich · Tzahi Shimkin · Ron Meir · Aviv Tamar -
2019 Poster: A Deep Reinforcement Learning Perspective on Internet Congestion Control »
Nathan Jay · Noga H. Rotman · Brighten Godfrey · Michael Schapira · Aviv Tamar -
2019 Oral: Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN »
dror freirich · Tzahi Shimkin · Ron Meir · Aviv Tamar -
2019 Oral: A Deep Reinforcement Learning Perspective on Internet Congestion Control »
Nathan Jay · Noga H. Rotman · Brighten Godfrey · Michael Schapira · Aviv Tamar -
2018 Poster: Investigating Human Priors for Playing Video Games »
Rachit Dubey · Pulkit Agrawal · Deepak Pathak · Tom Griffiths · Alexei Efros -
2018 Oral: Investigating Human Priors for Playing Video Games »
Rachit Dubey · Pulkit Agrawal · Deepak Pathak · Tom Griffiths · Alexei Efros -
2017 Poster: Curiosity-driven Exploration by Self-supervised Prediction »
Deepak Pathak · Pulkit Agrawal · Alexei Efros · Trevor Darrell -
2017 Poster: Constrained Policy Optimization »
Joshua Achiam · David Held · Aviv Tamar · Pieter Abbeel -
2017 Talk: Curiosity-driven Exploration by Self-supervised Prediction »
Deepak Pathak · Pulkit Agrawal · Alexei Efros · Trevor Darrell -
2017 Talk: Constrained Policy Optimization »
Joshua Achiam · David Held · Aviv Tamar · Pieter Abbeel