Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Principles of Distribution Shift (PODS)

Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables

Mengdi Xu · Peide Huang · Visak Kumar · Jielin Qiu · Chao Fang · Kuan-Hui Lee · Xuewei Qi · Henry Lam · Bo Li · Ding Zhao


Abstract:

Reinforcement Learning (RL) agents may only have incomplete information about tasks to solve. Although inferring the latent task could improve the performance, blindly trusting the task estimates may cause significant performance drops due to inevitable inference errors. One dominant way to enhance robustness is to optimize over worst-possible tasks, which may generate overly conservative policies. Moreover, most sequential decision-making formulations assume tasks are i.i.d. sampled and overlook the existence of task subpopulations. To address both challenges under task estimate uncertainty, we propose Group Distributionally Robust Markov Decision Process (GDR-MDP). GDR-MDP is flexible to encode prior task relationships via a latent mixture model, and leverage the prior by dynamically updating a belief distribution over mixtures. GDR-MDP has a distributionally robust decision criterion as finding the optimal policy that maximizes the expected return under the worst-possible qualified belief within an ambiguity set. We show both theoretically and empirically that GDR-MDP's hierarchical structure further enhances the distributional robustness over belief inference errors.

Chat is not available.