Timezone: »
Poster
Distributionally Robust $Q$-Learning
Zijian Liu · Jerry Bai · Jose Blanchet · Perry Dong · Wei Xu · Zhengqing Zhou · Zhengyuan Zhou
Reinforcement learning (RL) has demonstrated remarkable achievements in simulated environments. However, carrying this success to real environments requires the important attribute of robustness, which the existing RL algorithms often lack as they assume that the future deployment environment is the same as the training environment (i.e. simulator) in which the policy is learned. This assumption often does not hold due to the discrepancy between the simulator and the real environment and, as a result, and hence renders the learned policy fragile when deployed.In this paper, we propose a novel distributionally robust $Q$-learning algorithm that learns the best policy in the worst distributional perturbation of the environment. Our algorithm first transforms the infinite-dimensional learning problem (since the environment MDP perturbation lies in an infinite-dimensional space) into a finite-dimensional dual problem and subsequently uses a multi-level Monte-Carlo scheme to approximate the dual value using samples from the simulator. Despite the complexity, we show that the resulting distributionally robust $Q$-learning algorithm asymptotically converges to optimal worst-case policy, thus making it robust to future environment changes. Simulation results further demonstrate its strong empirical robustness.
Author Information
Zijian Liu (Boston University)
Jerry Bai (Horizon Robotics)
Jose Blanchet (Stanford University)
Perry Dong (University of California, Berkeley)
Wei Xu (Horizon Robotics)
Zhengqing Zhou (Stanford University)
Zhengyuan Zhou (Arena Technologies)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Distributionally Robust $Q$-Learning »
Wed. Jul 20th 09:45 -- 09:50 PM Room Room 318 - 320
More from the Same Authors
-
2023 Poster: Offline Reinforcement Learning with Closed-Form Policy Improvement Operators »
Jiachen Li · Edwin Zhang · Ming Yin · Jerry Bai · Yu-Xiang Wang · William Wang -
2022 Poster: Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction »
Zijian Liu · Ta Duy Nguyen · Alina Ene · Huy Nguyen -
2022 Poster: Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning »
Nathan Kallus · Xiaojie Mao · Kaiwen Wang · Zhengyuan Zhou -
2022 Spotlight: Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction »
Zijian Liu · Ta Duy Nguyen · Alina Ene · Huy Nguyen -
2022 Spotlight: Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning »
Nathan Kallus · Xiaojie Mao · Kaiwen Wang · Zhengyuan Zhou -
2021 Poster: Testing Group Fairness via Optimal Transport Projections »
Nian Si · Karthyek Murthy · Jose Blanchet · Viet Anh Nguyen -
2021 Spotlight: Testing Group Fairness via Optimal Transport Projections »
Nian Si · Karthyek Murthy · Jose Blanchet · Viet Anh Nguyen -
2021 Poster: Sequential Domain Adaptation by Synthesizing Distributionally Robust Experts »
Bahar Taskesen · Man-Chung Yue · Jose Blanchet · Daniel Kuhn · Viet Anh Nguyen -
2021 Oral: Sequential Domain Adaptation by Synthesizing Distributionally Robust Experts »
Bahar Taskesen · Man-Chung Yue · Jose Blanchet · Daniel Kuhn · Viet Anh Nguyen -
2021 Poster: Generative Particle Variational Inference via Estimation of Functional Gradients »
Neale Ratzlaff · Jerry Bai · Fuxin Li · Wei Xu -
2021 Spotlight: Generative Particle Variational Inference via Estimation of Functional Gradients »
Neale Ratzlaff · Jerry Bai · Fuxin Li · Wei Xu -
2020 Poster: Implicit Generative Modeling for Efficient Exploration »
Neale Ratzlaff · Qinxun Bai · Fuxin Li · Wei Xu -
2020 Poster: Robust Bayesian Classification Using An Optimistic Score Ratio »
Viet Anh Nguyen · Nian Si · Jose Blanchet -
2020 Poster: Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits »
Nian Si · Fan Zhang · Zhengyuan Zhou · Jose Blanchet -
2019 Poster: Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning »
Casey Chu · Jose Blanchet · Peter Glynn -
2019 Oral: Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning »
Casey Chu · Jose Blanchet · Peter Glynn