Timezone: »

Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning
Jiahui Li · Kun Kuang · Baoxiang Wang · Furui Liu · Long Chen · Changjie Fan · Fei Wu · Jun Xiao

Tue Jul 19 02:25 PM -- 02:30 PM (PDT) @ Room 318 - 320

Value decomposition (VD) methods have been widely used in cooperative multi-agent reinforcement learning (MARL), where credit assignment plays an important role in guiding the agents’ decentralized execution. In this paper, we investigate VD from a novel perspective of causal inference. We first show that the environment in existing VD methods is an unobserved confounder as the common cause factor of the global state and the joint value function, which leads to the confounding bias on learning credit assignment. We then present our approach, deconfounded value decomposition (DVD), which cuts off the backdoor confounding path from the global state to the joint value function. The cut is implemented by introducing the \textit{trajectory graph}, which depends only on the local trajectories, as a proxy confounder. DVD is general enough to be applied to various VD methods, and extensive experiments show that DVD can consistently achieve significant performance gains over different state-of-the-art VD methods on StarCraft II and MACO benchmarks.

Author Information

Jiahui Li (Zhejiang University)
Kun Kuang (Zhejiang University)

Kun Kuang, Associate Professor in the College of Computer Science and Technology, Zhejiang University. He received his Ph.D. in the Department of Computer Science and Technology at Tsinghua University in 2019. He was a visiting scholar at Stanford University. His main research interests include causal inference, Artificial Intelligence, and causally regularized machine learning. He has published over 30 papers in major international journals and conferences, including SIGKDD, ICML, ACM MM, AAAI, IJCAI, TKDE, TKDD, Engineering, and ICDM, etc.

Baoxiang Wang (The Chinese University of Hong Kong, Shenzhen)
Furui Liu (Huawei Noah's Ark Lab)
Long Chen (Columbia University)
Changjie Fan (NetEase Fuxi AI Lab)
Fei Wu (Zhejiang University)
Jun Xiao (Zhejiang University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors