Timezone: »
Many advances in cooperative multi-agent reinforcement learning (MARL) are based on two common design principles: value decomposition and parameter sharing. A typical MARL algorithm of this fashion decomposes a centralized Q-function into local Q-networks with parameters shared across agents. Such an algorithmic paradigm enables centralized training and decentralized execution (CTDE) and leads to efficient learning in practice. Despite all the advantages, we revisit these two principles and show that in certain scenarios, e.g., environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes. In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases, which partially supports some recent empirical observations that PG can be effective in many MARL testbeds. Inspired by our theoretical analysis, we present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors and empirically validate our findings on a variety of domains, ranging from the simplified matrix and grid-world games to complex benchmarks such as StarCraft Multi-Agent Challenge and Google Research Football. We hope our insights could benefit the community towards developing more general and more powerful MARL algorithms.
Author Information
Wei Fu (Tsinghua University)
Chao Yu (Tsinghua University)
Zelai Xu (Tsinghua University)
Jiaqi Yang (University of California, Berkeley)
Yi Wu (UC Berkeley)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning »
Tue. Jul 19th 08:40 -- 08:45 PM Room Room 318 - 320
More from the Same Authors
-
2021 : Disentangled Attention as Intrinsic Regularization for Bimanual Multi-Object Manipulation »
Minghao Zhang · Pingcheng Jian · Yi Wu · Harry (Huazhe) Xu · Xiaolong Wang -
2022 : Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning »
Zhecheng Yuan · Zhecheng Yuan · Zhengrong Xue · Zhengrong Xue · Bo Yuan · Bo Yuan · Xueqian Wang · Xueqian Wang · Yi Wu · Yi Wu · Yang Gao · Yang Gao · Huazhe Xu · Huazhe Xu -
2022 Poster: Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning »
Yunfei Li · Tian Gao · Jiaqi Yang · Huazhe Xu · Yi Wu -
2022 Spotlight: Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning »
Yunfei Li · Tian Gao · Jiaqi Yang · Huazhe Xu · Yi Wu -
2018 Poster: Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms »
Yi Wu · Siddharth Srivastava · Nicholas Hay · Simon Du · Stuart Russell -
2018 Oral: Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms »
Yi Wu · Siddharth Srivastava · Nicholas Hay · Simon Du · Stuart Russell