Timezone: »
We consider the problem of improving upon a black-box policy which operates on a different observation space than the learner. Such problems occur when augmenting an existing hand-engineered system with a new machine learning model or in a shared autonomy / human-AI complementarity context. We prove that following the naive policy gradient can lead to a decrease in performance because of incorrect grounding in a different observation space. Then, if we have access to both sets of observation at train time, we derive a method for correctly estimating a policy gradient via an application of the backdoor criterion. If we don't, we prove that under certain assumptions, we can use the proxy correction to correctly estimate a direction of improvement.
Author Information
Gokul Swamy (Carnegie Mellon University)
Sanjiban Choudhury (Cornell University)
J. Bagnell (Aurora Innovation)
Steven Wu (Carnegie Mellon University)
More from the Same Authors
-
2021 : Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
· Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2021 : Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2021 : Stateful Strategic Regression »
Keegan Harris · Hoda Heidari · Steven Wu -
2021 : Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods »
Terrance Liu · Giuseppe Vietri · Steven Wu -
2021 : Private Multi-Task Learning: Formulation and Applications to Federated Learning »
Shengyuan Hu · Steven Wu · Virginia Smith -
2021 : Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods »
Terrance Liu · Giuseppe Vietri · Steven Wu -
2021 : Understanding Clipped FedAvg: Convergence and Client-Level Differential Privacy »
xinwei zhang · Xiangyi Chen · Steven Wu · Mingyi Hong -
2021 : Improved Privacy Filters and Odometers: Time-Uniform Bounds in Privacy Composition »
Justin Whitehouse · Aaditya Ramdas · Ryan Rogers · Steven Wu -
2021 : Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2021 : Stateful Strategic Regression »
Keegan Harris · Hoda Heidari · Steven Wu -
2021 : Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2021 : Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2021 : Scalable Algorithms for Nonlinear Causal Inference »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2021 : Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2022 : Meta-Learning Adversarial Bandits »
Nina Balcan · Keegan Harris · Mikhail Khodak · Steven Wu -
2023 : Complementing a Policy with a Different Observation Space »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2023 : Adaptive Principal Component Regression with Applications to Panel Data »
Anish Agarwal · Keegan Harris · Justin Whitehouse · Steven Wu -
2023 : Strategyproof Decision-Making in Panel Data Settings and Beyond »
Keegan Harris · Anish Agarwal · Chara Podimata · Steven Wu -
2023 : Strategic Apple Tasting »
Keegan Harris · Chara Podimata · Steven Wu -
2023 : Strategyproof Decision-Making in Panel Data Settings and Beyond »
Keegan Harris · Anish Agarwal · Chara Podimata · Steven Wu -
2023 : Learning Shared Safety Constraints from Multi-task Demonstrations »
Konwoo Kim · Gokul Swamy · Zuxin Liu · Ding Zhao · Sanjiban Choudhury · Steven Wu -
2023 : Strategic Apple Tasting »
Keegan Harris · Chara Podimata · Steven Wu -
2023 : Learning Shared Safety Constraints from Multi-task Demonstrations »
Konwoo Kim · Gokul Swamy · Zuxin Liu · Ding Zhao · Sanjiban Choudhury · Steven Wu -
2023 Poster: Fully-Adaptive Composition in Differential Privacy »
Justin Whitehouse · Aaditya Ramdas · Ryan Rogers · Steven Wu -
2023 Oral: Nonparametric Extensions of Randomized Response for Private Confidence Sets »
Ian Waudby-Smith · Steven Wu · Aaditya Ramdas -
2023 Poster: Nonparametric Extensions of Randomized Response for Private Confidence Sets »
Ian Waudby-Smith · Steven Wu · Aaditya Ramdas -
2023 Poster: Inverse Reinforcement Learning without Reinforcement Learning »
Gokul Swamy · David Wu · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2023 Poster: Generating Private Synthetic Data with Genetic Algorithms »
Terrance Liu · Jingwu Tang · Giuseppe Vietri · Steven Wu -
2023 Poster: The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms »
Anirudh Vemula · Yuda Song · Aarti Singh · J. Bagnell · Sanjiban Choudhury -
2022 Poster: Information Discrepancy in Strategic Learning »
Yahav Bechavod · Chara Podimata · Steven Wu · Juba Ziani -
2022 Poster: Constrained Variational Policy Optimization for Safe Reinforcement Learning »
Zuxin Liu · Zhepeng Cen · Vladislav Isenbaev · Wei Liu · Steven Wu · Bo Li · Ding Zhao -
2022 Poster: Causal Imitation Learning under Temporally Correlated Noise »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2022 Spotlight: Constrained Variational Policy Optimization for Safe Reinforcement Learning »
Zuxin Liu · Zhepeng Cen · Vladislav Isenbaev · Wei Liu · Steven Wu · Bo Li · Ding Zhao -
2022 Spotlight: Information Discrepancy in Strategic Learning »
Yahav Bechavod · Chara Podimata · Steven Wu · Juba Ziani -
2022 Oral: Causal Imitation Learning under Temporally Correlated Noise »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2022 Poster: Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2022 Poster: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Poster: Improved Regret for Differentially Private Exploration in Linear MDP »
Dung Ngo · Giuseppe Vietri · Steven Wu -
2022 Poster: Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy »
xinwei zhang · Xiangyi Chen · Mingyi Hong · Steven Wu · Jinfeng Yi -
2022 Spotlight: Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy »
xinwei zhang · Xiangyi Chen · Mingyi Hong · Steven Wu · Jinfeng Yi -
2022 Spotlight: Improved Regret for Differentially Private Exploration in Linear MDP »
Dung Ngo · Giuseppe Vietri · Steven Wu -
2022 Spotlight: Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2022 Spotlight: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2021 : Poster »
Shiji Zhou · Nastaran Okati · Wichinpong Sinchaisri · Kim de Bie · Ana Lucic · Mina Khan · Ishaan Shah · JINGHUI LU · Andreas Kirsch · Julius Frost · Ze Gong · Gokul Swamy · Ah Young Kim · Ahmed Baruwa · Ranganath Krishnan -
2021 Poster: Leveraging Public Data for Practical Private Query Release »
Terrance Liu · Giuseppe Vietri · Thomas Steinke · Jonathan Ullman · Steven Wu -
2021 Spotlight: Leveraging Public Data for Practical Private Query Release »
Terrance Liu · Giuseppe Vietri · Thomas Steinke · Jonathan Ullman · Steven Wu -
2021 Poster: Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2021 Spotlight: Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2021 Poster: Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2021 Poster: Incentivizing Compliance with Algorithmic Instruments »
Dung Ngo · Logan Stapleton · Vasilis Syrgkanis · Steven Wu -
2021 Spotlight: Incentivizing Compliance with Algorithmic Instruments »
Dung Ngo · Logan Stapleton · Vasilis Syrgkanis · Steven Wu -
2021 Spotlight: Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju