Timezone: »
Spotlight
Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
Nathan Kallus · Xiaojie Mao · Kaiwen Wang · Zhengyuan Zhou
Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions, which is crucial in applications where online experimentation is limited. However, depending entirely on logged data, OPE/L is sensitive to environment distribution shifts --- discrepancies between the data-generating environment and that where policies are deployed. Si et al., (2020) proposed distributionally robust OPE/L (DROPE/L) to address this, but the proposal relies on inverse-propensity weighting, whose estimation error and regret will deteriorate if propensities are nonparametrically estimated and whose variance is suboptimal even if not. For standard, non-robust, OPE/L, this is solved by doubly robust (DR) methods, but they do not naturally extend to the more complex DROPE/L, which involves a worst-case expectation. In this paper, we propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets. For evaluation, we propose Localized Doubly Robust DROPE (LDR$^2$OPE) and show that it achieves semiparametric efficiency under weak product rates conditions. Thanks to a localization technique, LDR$^2$OPE only requires fitting a small number of regressions, just like DR methods for standard OPE. For learning, we propose Continuum Doubly Robust DROPL (CDR$^2$OPL) and show that, under a product rate condition involving a continuum of regressions, it enjoys a fast regret rate of $O(N^{-1/2})$ even when unknown propensities are nonparametrically estimated. We empirically validate our algorithms in simulations and further extend our results to general $f$-divergence uncertainty sets.
Author Information
Nathan Kallus (Cornell University)
Xiaojie Mao (Tsinghua University)
Kaiwen Wang (Cornell University and Cornell Tech)
Zhengyuan Zhou (Arena Technologies)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning »
Thu. Jul 21st through Fri the 22nd Room Hall E #904
More from the Same Authors
-
2023 : Provable Offline Reinforcement Learning with Human Feedback »
Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 : Provable Offline Reinforcement Learning with Human Feedback »
Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 Poster: Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR »
Kaiwen Wang · Nathan Kallus · Wen Sun -
2023 Poster: Smooth Non-stationary Bandits »
Su Jia · Qian Xie · Nathan Kallus · Peter Frazier -
2023 Poster: Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings »
Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun -
2023 Poster: B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding »
Miruna Oprescu · Jacob Dorn · Marah Ghoummaid · Andrew Jesson · Nathan Kallus · Uri Shalit -
2022 Poster: Learning Bellman Complete Representations for Offline Policy Evaluation »
Jonathan Chang · Kaiwen Wang · Nathan Kallus · Wen Sun -
2022 Oral: Learning Bellman Complete Representations for Offline Policy Evaluation »
Jonathan Chang · Kaiwen Wang · Nathan Kallus · Wen Sun -
2022 Poster: Distributionally Robust $Q$-Learning »
Zijian Liu · Jerry Bai · Jose Blanchet · Perry Dong · Wei Xu · Zhengqing Zhou · Zhengyuan Zhou -
2022 Spotlight: Distributionally Robust $Q$-Learning »
Zijian Liu · Jerry Bai · Jose Blanchet · Perry Dong · Wei Xu · Zhengqing Zhou · Zhengyuan Zhou -
2021 Poster: Optimal Off-Policy Evaluation from Multiple Logging Policies »
Nathan Kallus · Yuta Saito · Masatoshi Uehara -
2021 Spotlight: Optimal Off-Policy Evaluation from Multiple Logging Policies »
Nathan Kallus · Yuta Saito · Masatoshi Uehara -
2020 Poster: Statistically Efficient Off-Policy Policy Gradients »
Nathan Kallus · Masatoshi Uehara -
2020 Poster: DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training »
Nathan Kallus -
2020 Poster: Efficient Policy Learning from Surrogate-Loss Classification Reductions »
Andrew Bennett · Nathan Kallus -
2020 Poster: Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation »
Nathan Kallus · Masatoshi Uehara -
2019 Poster: Classifying Treatment Responders Under Causal Effect Monotonicity »
Nathan Kallus -
2019 Oral: Classifying Treatment Responders Under Causal Effect Monotonicity »
Nathan Kallus -
2018 Poster: Residual Unfairness in Fair Machine Learning from Prejudiced Data »
Nathan Kallus · Angela Zhou -
2018 Oral: Residual Unfairness in Fair Machine Learning from Prejudiced Data »
Nathan Kallus · Angela Zhou -
2017 Poster: Recursive Partitioning for Personalization using Observational Data »
Nathan Kallus -
2017 Talk: Recursive Partitioning for Personalization using Observational Data »
Nathan Kallus