Timezone: »

 
Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables
Mengdi Xu · Peide Huang · Visak Kumar · Jielin Qiu · Chao Fang · Kuan-Hui Lee · Xuewei Qi · Henry Lam · Bo Li · Ding Zhao

Reinforcement Learning (RL) agents may only have incomplete information about tasks to solve. Although inferring the latent task could improve the performance, blindly trusting the task estimates may cause significant performance drops due to inevitable inference errors. One dominant way to enhance robustness is to optimize over worst-possible tasks, which may generate overly conservative policies. Moreover, most sequential decision-making formulations assume tasks are i.i.d. sampled and overlook the existence of task subpopulations. To address both challenges under task estimate uncertainty, we propose Group Distributionally Robust Markov Decision Process (GDR-MDP). GDR-MDP is flexible to encode prior task relationships via a latent mixture model, and leverage the prior by dynamically updating a belief distribution over mixtures. GDR-MDP has a distributionally robust decision criterion as finding the optimal policy that maximizes the expected return under the worst-possible qualified belief within an ambiguity set. We show both theoretically and empirically that GDR-MDP's hierarchical structure further enhances the distributional robustness over belief inference errors.

Author Information

Mengdi Xu (Carnegie Mellon University)
Peide Huang (Carnegie Mellon University)
Visak Kumar (Toyota Research Institute)
Jielin Qiu (Carnegie Mellon University)
Chao Fang (Toyota Research Institute)
Kuan-Hui Lee (Toyota Research Institute)
Xuewei Qi (General Motors)
Henry Lam (Columbia University)
Bo Li (UIUC)
Bo Li

Dr. Bo Li is an assistant professor in the Department of Computer Science at the University of Illinois at Urbana–Champaign. She is the recipient of the IJCAI Computers and Thought Award, Alfred P. Sloan Research Fellowship, AI’s 10 to Watch, NSF CAREER Award, MIT Technology Review TR-35 Award, Dean's Award for Excellence in Research, C.W. Gear Outstanding Junior Faculty Award, Intel Rising Star award, Symantec Research Labs Fellowship, Rising Star Award, Research Awards from Tech companies such as Amazon, Facebook, Intel, IBM, and eBay, and best paper awards at several top machine learning and security conferences. Her research focuses on both theoretical and practical aspects of trustworthy machine learning, which is at the intersection of machine learning, security, privacy, and game theory. She has designed several scalable frameworks for trustworthy machine learning and privacy-preserving data publishing. Her work has been featured by major publications and media outlets such as Nature, Wired, Fortune, and New York Times.

Ding Zhao (Carnegie Mellon University)

More from the Same Authors