Timezone: »

FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning
Tianhao Zhang · yueheng li · Chen Wang · Guangming Xie · Zongqing Lu

Tue Jul 20 09:00 PM -- 11:00 PM (PDT) @

Value decomposition recently injects vigorous vitality into multi-agent actor-critic methods. However, existing decomposed actor-critic methods cannot guarantee the convergence of global optimum. In this paper, we present a novel multi-agent actor-critic method, FOP, which can factorize the optimal joint policy induced by maximum-entropy multi-agent reinforcement learning (MARL) into individual policies. Theoretically, we prove that factorized individual policies of FOP converge to the global optimum. Empirically, in the well-known matrix game and differential game, we verify that FOP can converge to the global optimum for both discrete and continuous action spaces. We also evaluate FOP on a set of StarCraft II micromanagement tasks, and demonstrate that FOP substantially outperforms state-of-the-art decomposed value-based and actor-critic methods.

Author Information

Tianhao Zhang (Peking University)
yueheng li (北京大学)
Chen Wang (Peking University)
Guangming Xie (1. State Key Laboratory for Turbulence and Complex Systems, College of Engineering, Peking University; 2. Institute of Ocean Research, Peking University)
Zongqing Lu (Peking University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors