Timezone: »
Policies often fail at test-time due to distribution shifts---changes in the state and reward that occur when an end user deploys the policy in environments different from those seen in training. Data augmentation can help models be more robust to such shifts by varying specific concepts in the state, e.g. object color, that are task-irrelevant and should not impact desired actions. However, designers training the agent don't often know which concepts are irrelevant a priori. We propose a human-in-the-loop framework to leverage feedback from the end user to quickly identify and augment task-irrelevant visual state concepts. Our framework generates counterfactual demonstrations that allow users to quickly isolate shifted state concepts and identify if they should not impact the desired task, and can therefore be augmented using existing actions. We present experiments validating our full pipeline on discrete and continuous control tasks with real human users. Our method better enables users to (1) understand agent failure, (2) improve sample efficiency of demonstrations required for finetuning, and (3) adapt the agent to their desired reward.
Author Information
Andi Peng (MIT)
Aviv Netanyahu (MIT)
Mark Ho (New York University)
Tianmin Shu (Massachusetts Institute of Technology)
Andreea Bobu (University of California Berkeley)
Julie Shah (MIT)
Pulkit Agrawal (MIT)
More from the Same Authors
-
2021 : Topological Experience Replay for Fast Q-Learning »
Zhang-Wei Hong · Tao Chen · Yen-Chen Lin · Joni Pajarinen · Pulkit Agrawal -
2021 : Topological Experience Replay for Fast Q-Learning »
Zhang-Wei Hong · Tao Chen · Yen-Chen Lin · Joni Pajarinen · Pulkit Agrawal -
2021 : Understanding the Generalization Gap in Visual Reinforcement Learning »
Anurag Ajay · Ge Yang · Ofir Nachum · Pulkit Agrawal -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2023 : Visual Dexterity: In-hand Dexterous Manipulation from Depth »
Tao Chen · Megha Tippur · Siyang Wu · Vikash Kumar · Edward Adelson · Pulkit Agrawal -
2023 : Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-loop feedback »
Marcel Torne Villasevil · Max Balsells I Pamies · Zihan Wang · Samedh Desai · Tao Chen · Pulkit Agrawal · Abhishek Gupta -
2023 Workshop: Interactive Learning with Implicit Human Feedback »
Andi Peng · Akanksha Saran · Andreea Bobu · Tengyang Xie · Pierre-Yves Oudeyer · Anca Dragan · John Langford -
2023 : Mark Ho's Talk »
Mark Ho -
2023 Poster: Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation »
Zechu Li · Tao Chen · Zhang-Wei Hong · Anurag Ajay · Pulkit Agrawal -
2023 Poster: Statistical Learning under Heterogenous Distribution Shift »
Max Simchowitz · Anurag Ajay · Pulkit Agrawal · Akshay Krishnamurthy -
2023 Poster: Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks »
Minyoung Huh · Brian Cheung · Pulkit Agrawal · Phillip Isola -
2023 Poster: TGRL: An Algorithm for Teacher Guided Reinforcement Learning »
Idan Shenfeld · Zhang-Wei Hong · Aviv Tamar · Pulkit Agrawal -
2022 : Beyond Learning from Demonstrations »
Andreea Bobu -
2022 Poster: Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning »
Aviv Netanyahu · Tianmin Shu · Josh Tenenbaum · Pulkit Agrawal -
2022 Spotlight: Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning »
Aviv Netanyahu · Tianmin Shu · Josh Tenenbaum · Pulkit Agrawal -
2022 Poster: Prototype Based Classification from Hierarchy to Fairness »
Mycal Tucker · Julie Shah -
2022 Poster: Offline RL Policies Should Be Trained to be Adaptive »
Dibya Ghosh · Anurag Ajay · Pulkit Agrawal · Sergey Levine -
2022 Spotlight: Prototype Based Classification from Hierarchy to Fairness »
Mycal Tucker · Julie Shah -
2022 Oral: Offline RL Policies Should Be Trained to be Adaptive »
Dibya Ghosh · Anurag Ajay · Pulkit Agrawal · Sergey Levine -
2021 Workshop: Self-Supervised Learning for Reasoning and Perception »
Pengtao Xie · Shanghang Zhang · Ishan Misra · Pulkit Agrawal · Katerina Fragkiadaki · Ruisi Zhang · Tassilo Klein · Asli Celikyilmaz · Mihaela van der Schaar · Eric Xing -
2021 : Robust Robot Learning via Human-Guided Intermediate Representations »
Andreea Bobu -
2021 Poster: Learning Task Informed Abstractions »
Xiang Fu · Ge Yang · Pulkit Agrawal · Tommi Jaakkola -
2021 Spotlight: Learning Task Informed Abstractions »
Xiang Fu · Ge Yang · Pulkit Agrawal · Tommi Jaakkola -
2018 Poster: Investigating Human Priors for Playing Video Games »
Rachit Dubey · Pulkit Agrawal · Deepak Pathak · Tom Griffiths · Alexei Efros -
2018 Oral: Investigating Human Priors for Playing Video Games »
Rachit Dubey · Pulkit Agrawal · Deepak Pathak · Tom Griffiths · Alexei Efros -
2017 Poster: Curiosity-driven Exploration by Self-supervised Prediction »
Deepak Pathak · Pulkit Agrawal · Alexei Efros · Trevor Darrell -
2017 Talk: Curiosity-driven Exploration by Self-supervised Prediction »
Deepak Pathak · Pulkit Agrawal · Alexei Efros · Trevor Darrell