Timezone: »
The number of states in a dynamic process is exponential in the number of objects, making reinforcement learning (RL) difficult in complex, multi-object domains. For agents to scale to the real world, they will need to react to and reason about unseen combinations of objects. We argue that the ability to recognize and use local factorization in transition dynamics is a key element in unlocking the power of multi-object reasoning. To this end, we show that (1) known local structure in the environment transitions is sufficient for an exponential reduction in the sample complexity of training a dynamics model, and (2) a locally factored dynamics model provably generalizes out-of-distribution to unseen states and actions. Knowing the local structure also allows us to predict which unseen states and actions this dynamics model will generalize to. We propose to leverage these observations in a novel Model-based Counterfactual Data Augmentation (MoCoDA) framework. MoCoDA applies a learned locally factored dynamics model to an augmented distribution of states and actions to generate counterfactual transitions for RL. MoCoDA works with a broader set of local structures than prior work and allows for direct control over the augmented training distribution. We show that MoCoDA enables RL agents to learn policies that generalize to unseen states and actions. We use MoCoDA to train an offline RL agent to solve an out-of-distribution robotics manipulation task on which standard offline RL algorithms fail.
Author Information
Silviu Pitis (University of Toronto)
Elliot Creager (University of Toronto)
Ajay Mandlekar (NVIDIA)
Animesh Garg (University of Toronto, Vector Institute, Nvidia)
More from the Same Authors
-
2020 : Counterfactual Data Augmentation using Locally Factored Dynamics »
Silviu Pitis -
2021 : Measuring User Recourse in a Dynamic Recommender System »
Dilys Dickson · Elliot Creager -
2021 : Online Algorithmic Recourse by Collective Action »
Elliot Creager · Richard Zemel -
2021 : Auditing AI models for Verified Deployment under Semantic Specifications »
Homanga Bharadhwaj · De-An Huang · Chaowei Xiao · Anima Anandkumar · Animesh Garg -
2021 : Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 : Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings »
Shunshi Zhang · Murat Erdogdu · Animesh Garg -
2021 : Learning by Watching: Physical Imitation of Manipulation Skills from Human Videos »
Haoyu Xiong · Yun-Chun Chen · Homanga Bharadhwaj · Samrath Sinha · Animesh Garg -
2022 : Towards Environment-Invariant Representation Learning for Robust Task Transfer »
Benjamin Eyre · Richard Zemel · Elliot Creager -
2022 : VIPer: Iterative Value-Aware Model Learning on the Value Improvement Path »
Romina Abachi · Claas Voelcker · Animesh Garg · Amir-massoud Farahmand -
2022 Poster: Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics »
Matthias Weissenbacher · Samrath Sinha · Animesh Garg · Yoshinobu Kawahara -
2022 Spotlight: Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics »
Matthias Weissenbacher · Samrath Sinha · Animesh Garg · Yoshinobu Kawahara -
2021 Poster: On Disentangled Representations Learned from Correlated Data »
Frederik Träuble · Elliot Creager · Niki Kilbertus · Francesco Locatello · Andrea Dittadi · Anirudh Goyal · Bernhard Schölkopf · Stefan Bauer -
2021 Poster: Environment Inference for Invariant Learning »
Elliot Creager · Joern-Henrik Jacobsen · Richard Zemel -
2021 Spotlight: Environment Inference for Invariant Learning »
Elliot Creager · Joern-Henrik Jacobsen · Richard Zemel -
2021 Oral: On Disentangled Representations Learned from Correlated Data »
Frederik Träuble · Elliot Creager · Niki Kilbertus · Francesco Locatello · Andrea Dittadi · Anirudh Goyal · Bernhard Schölkopf · Stefan Bauer -
2021 Poster: Principled Exploration via Optimistic Bootstrapping and Backward Induction »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 Poster: Value Iteration in Continuous Actions, States and Time »
Michael Lutter · Shie Mannor · Jan Peters · Dieter Fox · Animesh Garg -
2021 Spotlight: Value Iteration in Continuous Actions, States and Time »
Michael Lutter · Shie Mannor · Jan Peters · Dieter Fox · Animesh Garg -
2021 Spotlight: Principled Exploration via Optimistic Bootstrapping and Backward Induction »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 Poster: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning »
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviychuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar -
2021 Poster: Coach-Player Multi-agent Reinforcement Learning for Dynamic Team Composition »
Bo Liu · Qiang Liu · Peter Stone · Animesh Garg · Yuke Zhu · Anima Anandkumar -
2021 Spotlight: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning »
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviychuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar -
2021 Oral: Coach-Player Multi-agent Reinforcement Learning for Dynamic Team Composition »
Bo Liu · Qiang Liu · Peter Stone · Animesh Garg · Yuke Zhu · Anima Anandkumar -
2020 : Counterfactual Data Augmentation using Locally Factored Dynamics »
Silviu Pitis -
2020 Poster: Semi-Supervised StyleGAN for Disentanglement Learning »
Weili Nie · Tero Karras · Animesh Garg · Shoubhik Debnath · Anjul Patney · Ankit Patel · Anima Anandkumar -
2020 Poster: Causal Modeling for Fairness In Dynamical Systems »
Elliot Creager · David Madras · Toniann Pitassi · Richard Zemel -
2020 Poster: Angular Visual Hardness »
Beidi Chen · Weiyang Liu · Zhiding Yu · Jan Kautz · Anshumali Shrivastava · Animesh Garg · Anima Anandkumar -
2020 Poster: Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach »
Martin Mladenov · Elliot Creager · Omer Ben-Porat · Kevin Swersky · Richard Zemel · Craig Boutilier -
2019 Poster: Flexibly Fair Representation Learning by Disentanglement »
Elliot Creager · David Madras · Joern-Henrik Jacobsen · Marissa Weis · Kevin Swersky · Toniann Pitassi · Richard Zemel -
2019 Oral: Flexibly Fair Representation Learning by Disentanglement »
Elliot Creager · David Madras · Joern-Henrik Jacobsen · Marissa Weis · Kevin Swersky · Toniann Pitassi · Richard Zemel -
2018 Poster: Learning Adversarially Fair and Transferable Representations »
David Madras · Elliot Creager · Toniann Pitassi · Richard Zemel -
2018 Oral: Learning Adversarially Fair and Transferable Representations »
David Madras · Elliot Creager · Toniann Pitassi · Richard Zemel