Timezone: »

Distributionally Robust $Q$-Learning
Zijian Liu · Zhengqing Zhou · Perry Dong · Jerry Bai · Jose Blanchet · Wei Xu · Zhengyuan Zhou

Wed Jul 20 03:30 PM -- 05:30 PM (PDT) @ Hall E #930
Reinforcement learning (RL) has demonstrated remarkable achievements in simulated environments. However, carrying this success to real environments requires the important attribute of robustness, which the existing RL algorithms often lack as they assume the future deployment environment is the same as the training environment (i.e. simulator) in which the policy is learned, an assumption that 1) often does not hold due to the discrepancy between the simulator and the real environment and 2) renders the learned policy fragile as a result.In this paper, we aim to make initial progress in addressing the robustness problem. In particular, we propose a novel distributionally robust $Q$-learning algorithm that learns the best policy in the worst distributional perturbation of the environment. Our algorithm first transforms the infinite-dimensional learning problem (since the environment MDP perturbation lies in an infinite-dimensional space) into a finite-dimensional dual problem and subsequently uses a multi-level Monte-Carlo scheme to approximate the dual value using samples from the simulator. Despite the complexity, we show that the resulting distributionally robust $Q$-learning algorithm asymptotically converges to optimal worst-case policy, thus making it robust to future environment changes. Simulation results further demonstrate its empirical robustness.

Author Information

Zijian Liu (Boston University)
Zhengqing Zhou (Stanford University)
Perry Dong (University of California, Berkeley)
Jerry Bai (Horizon Robotics)
Jose Blanchet (Stanford University)
Wei Xu (Horizon Robotics)
Zhengyuan Zhou (Arena Technologies & NYU)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors