Timezone: »

Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables
Mengdi Xu · Peide Huang · Visak Kumar · Jielin Qiu · Chao Fang · Kuan-Hui Lee · Xuewei Qi · Henry Lam · Bo Li · Ding Zhao

Reinforcement Learning (RL) agents may only have incomplete information about tasks to solve. Although inferring the latent task could improve the performance, blindly trusting the task estimates may cause significant performance drops due to inevitable inference errors. One dominant way to enhance robustness is to optimize over worst-possible tasks, which may generate overly conservative policies. Moreover, most sequential decision-making formulations assume tasks are i.i.d. sampled and overlook the existence of task subpopulations. To address both challenges under task estimate uncertainty, we propose Group Distributionally Robust Markov Decision Process (GDR-MDP). GDR-MDP is flexible to encode prior task relationships via a latent mixture model, and leverage the prior by dynamically updating a belief distribution over mixtures. GDR-MDP has a distributionally robust decision criterion as finding the optimal policy that maximizes the expected return under the worst-possible qualified belief within an ambiguity set. We show both theoretically and empirically that GDR-MDP's hierarchical structure further enhances the distributional robustness over belief inference errors.

Author Information

Mengdi Xu (Carnegie Mellon University)
Peide Huang (Carnegie Mellon University)
Visak Kumar (Toyota Research Institute)
Jielin Qiu (Carnegie Mellon University)
Chao Fang (Toyota Research Institute)
Kuan-Hui Lee (Toyota Research Institute)
Xuewei Qi (General Motors)
Henry Lam (Columbia University)
Bo Li (UIUC)
Ding Zhao (Carnegie Mellon University)

More from the Same Authors