Timezone: »
Policy learning using historical observational data is an important problem that has found widespread applications. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data–an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with bandit observational data. We propose a novel learning algorithm that is able to learn a robust policy to adversarial perturbations and unknown covariate shifts. We first present a policy evaluation procedure in the ambiguous environment and also give a heuristic algorithm to solve the distributionally robust policy learning problems efficiently. Additionally, we provide extensive simulations to demonstrate the robustness of our policy.
Author Information
Nian Si (Stanford University)
Fan Zhang (Stanford University)
Zhengyuan Zhou (Stanford University)
Jose Blanchet (Stanford University)
More from the Same Authors
-
2022 Poster: Distributionally Robust $Q$-Learning »
Zijian Liu · Jerry Bai · Jose Blanchet · Perry Dong · Wei Xu · Zhengqing Zhou · Zhengyuan Zhou -
2022 Spotlight: Distributionally Robust $Q$-Learning »
Zijian Liu · Jerry Bai · Jose Blanchet · Perry Dong · Wei Xu · Zhengqing Zhou · Zhengyuan Zhou -
2021 Poster: Testing Group Fairness via Optimal Transport Projections »
Nian Si · Karthyek Murthy · Jose Blanchet · Viet Anh Nguyen -
2021 Spotlight: Testing Group Fairness via Optimal Transport Projections »
Nian Si · Karthyek Murthy · Jose Blanchet · Viet Anh Nguyen -
2021 Poster: Sequential Domain Adaptation by Synthesizing Distributionally Robust Experts »
Bahar Taskesen · Man-Chung Yue · Jose Blanchet · Daniel Kuhn · Viet Anh Nguyen -
2021 Oral: Sequential Domain Adaptation by Synthesizing Distributionally Robust Experts »
Bahar Taskesen · Man-Chung Yue · Jose Blanchet · Daniel Kuhn · Viet Anh Nguyen -
2020 Poster: Gradient-free Online Learning in Continuous Games with Delayed Rewards »
Amélie Héliou · Panayotis Mertikopoulos · Zhengyuan Zhou -
2020 Poster: Robust Bayesian Classification Using An Optimistic Score Ratio »
Viet Anh Nguyen · Nian Si · Jose Blanchet -
2020 Poster: Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games »
Tianyi Lin · Zhengyuan Zhou · Panayotis Mertikopoulos · Michael Jordan -
2019 Poster: Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning »
Casey Chu · Jose Blanchet · Peter Glynn -
2019 Oral: Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning »
Casey Chu · Jose Blanchet · Peter Glynn -
2018 Poster: MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels »
Lu Jiang · Zhengyuan Zhou · Thomas Leung · Li-Jia Li · Li Fei-Fei -
2018 Poster: Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go? »
Zhengyuan Zhou · Panayotis Mertikopoulos · Nicholas Bambos · Peter Glynn · Yinyu Ye · Li-Jia Li · Li Fei-Fei -
2018 Oral: MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels »
Lu Jiang · Zhengyuan Zhou · Thomas Leung · Li-Jia Li · Li Fei-Fei -
2018 Oral: Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go? »
Zhengyuan Zhou · Panayotis Mertikopoulos · Nicholas Bambos · Peter Glynn · Yinyu Ye · Li-Jia Li · Li Fei-Fei