Timezone: »
Poster
Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
Nathan Kallus · Masatoshi Uehara
Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. We consider for the first time the semiparametric efficiency limits of OPE in Markov decision processes (MDPs), where actions, rewards, and states are memoryless. We show existing OPE estimators may fail to be efficient in this setting. We develop a new estimator based on cross-fold estimation of $q$-functions and marginalized density ratios, which we term double reinforcement learning (DRL). We show that DRL is efficient when both components are estimated at fourth-root rates and is also doubly robust when only one component is consistent.
We investigate these properties empirically and demonstrate the performance benefits due to harnessing memorylessness.
Author Information
Nathan Kallus (Cornell University)
Masatoshi Uehara (Harvard University)
More from the Same Authors
-
2023 : Provable Offline Reinforcement Learning with Human Feedback »
Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 : Provable Offline Reinforcement Learning with Human Feedback »
Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 Poster: Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR »
Kaiwen Wang · Nathan Kallus · Wen Sun -
2023 Poster: Smooth Non-stationary Bandits »
Su Jia · Qian Xie · Nathan Kallus · Peter Frazier -
2023 Poster: Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings »
Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun -
2023 Poster: B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding »
Miruna Oprescu · Jacob Dorn · Marah Ghoummaid · Andrew Jesson · Nathan Kallus · Uri Shalit -
2022 Poster: Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning »
Nathan Kallus · Xiaojie Mao · Kaiwen Wang · Zhengyuan Zhou -
2022 Poster: Learning Bellman Complete Representations for Offline Policy Evaluation »
Jonathan Chang · Kaiwen Wang · Nathan Kallus · Wen Sun -
2022 Spotlight: Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning »
Nathan Kallus · Xiaojie Mao · Kaiwen Wang · Zhengyuan Zhou -
2022 Oral: Learning Bellman Complete Representations for Offline Policy Evaluation »
Jonathan Chang · Kaiwen Wang · Nathan Kallus · Wen Sun -
2021 Poster: Optimal Off-Policy Evaluation from Multiple Logging Policies »
Nathan Kallus · Yuta Saito · Masatoshi Uehara -
2021 Spotlight: Optimal Off-Policy Evaluation from Multiple Logging Policies »
Nathan Kallus · Yuta Saito · Masatoshi Uehara -
2020 Poster: Minimax Weight and Q-Function Learning for Off-Policy Evaluation »
Masatoshi Uehara · Jiawei Huang · Nan Jiang -
2020 Poster: Statistically Efficient Off-Policy Policy Gradients »
Nathan Kallus · Masatoshi Uehara -
2020 Poster: DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training »
Nathan Kallus -
2020 Poster: Efficient Policy Learning from Surrogate-Loss Classification Reductions »
Andrew Bennett · Nathan Kallus -
2019 Poster: Classifying Treatment Responders Under Causal Effect Monotonicity »
Nathan Kallus -
2019 Oral: Classifying Treatment Responders Under Causal Effect Monotonicity »
Nathan Kallus -
2018 Poster: Residual Unfairness in Fair Machine Learning from Prejudiced Data »
Nathan Kallus · Angela Zhou -
2018 Oral: Residual Unfairness in Fair Machine Learning from Prejudiced Data »
Nathan Kallus · Angela Zhou -
2017 Poster: Recursive Partitioning for Personalization using Observational Data »
Nathan Kallus -
2017 Talk: Recursive Partitioning for Personalization using Observational Data »
Nathan Kallus