Timezone: »
Value decomposition recently injects vigorous vitality into multi-agent actor-critic methods. However, existing decomposed actor-critic methods cannot guarantee the convergence of global optimum. In this paper, we present a novel multi-agent actor-critic method, FOP, which can factorize the optimal joint policy induced by maximum-entropy multi-agent reinforcement learning (MARL) into individual policies. Theoretically, we prove that factorized individual policies of FOP converge to the global optimum. Empirically, in the well-known matrix game and differential game, we verify that FOP can converge to the global optimum for both discrete and continuous action spaces. We also evaluate FOP on a set of StarCraft II micromanagement tasks, and demonstrate that FOP substantially outperforms state-of-the-art decomposed value-based and actor-critic methods.
Author Information
Tianhao Zhang (Peking University)
yueheng li (北京大学)
Chen Wang (Peking University)
Guangming Xie (1. State Key Laboratory for Turbulence and Complex Systems, College of Engineering, Peking University; 2. Institute of Ocean Research, Peking University)
Zongqing Lu (Peking University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning »
Wed. Jul 21st 12:45 -- 12:50 AM Room
More from the Same Authors
-
2022 Poster: Divergence-Regularized Multi-Agent Actor-Critic »
Kefan Su · Zongqing Lu -
2022 Poster: Difference Advantage Estimation for Multi-Agent Policy Gradients »
yueheng li · Guangming Xie · Zongqing Lu -
2022 Poster: Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning »
Haoqi Yuan · Zongqing Lu -
2022 Spotlight: Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning »
Haoqi Yuan · Zongqing Lu -
2022 Spotlight: Divergence-Regularized Multi-Agent Actor-Critic »
Kefan Su · Zongqing Lu -
2022 Spotlight: Difference Advantage Estimation for Multi-Agent Policy Gradients »
yueheng li · Guangming Xie · Zongqing Lu -
2021 : RL + Operations Research Panel »
Jim Dai · Fei Fang · Shie Mannor · Yuandong Tian · Zhiwei (Tony) Qin · Zongqing Lu -
2021 Workshop: Reinforcement Learning for Real Life »
Yuxi Li · Minmin Chen · Omer Gottesman · Lihong Li · Zongqing Lu · Rupam Mahmood · Niranjani Prasad · Zhiwei (Tony) Qin · Csaba Szepesvari · Matthew Taylor -
2021 Poster: The Emergence of Individuality »
Jiechuan Jiang · Zongqing Lu -
2021 Oral: The Emergence of Individuality »
Jiechuan Jiang · Zongqing Lu