Timezone: »
Entropy regularization is a popular method in reinforcement learning (RL). Although it has many advantages, it alters the RL objective and makes the converged policy deviate from the optimal policy of the original Markov Decision Process (MDP). Though divergence regularization has been proposed to settle this problem, it cannot be trivially applied to cooperative multi-agent reinforcement learning (MARL). In this paper, we investigate divergence regularization in cooperative MARL and propose a novel off-policy cooperative MARL framework, divergence-regularized multi-agent actor-critic (DMAC). Theoretically, we derive the update rule of DMAC which is naturally off-policy, guarantees the monotonic policy improvement and convergence in both the original MDP and the divergence-regularized MDP, and is not biased by the regularization. We also give a bound of the discrepancy between the converged policy and the optimal policy in the original MDP. DMAC is a flexible framework and can be combined with many existing MARL algorithms. Empirically, we evaluate DMAC in a didactic stochastic game and StarCraft Multi-Agent Challenge and show that DMAC substantially improves the performance of existing MARL algorithms.
Author Information
Kefan Su (peking university)
Zongqing Lu (Peking University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Divergence-Regularized Multi-Agent Actor-Critic »
Thu. Jul 21st 05:55 -- 06:00 PM Room Room 327 - 329
More from the Same Authors
-
2022 Poster: Difference Advantage Estimation for Multi-Agent Policy Gradients »
yueheng li · Guangming Xie · Zongqing Lu -
2022 Poster: Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning »
Haoqi Yuan · Zongqing Lu -
2022 Spotlight: Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning »
Haoqi Yuan · Zongqing Lu -
2022 Spotlight: Difference Advantage Estimation for Multi-Agent Policy Gradients »
yueheng li · Guangming Xie · Zongqing Lu -
2021 : RL + Operations Research Panel »
Jim Dai · Fei Fang · Shie Mannor · Yuandong Tian · Zhiwei (Tony) Qin · Zongqing Lu -
2021 Workshop: Reinforcement Learning for Real Life »
Yuxi Li · Minmin Chen · Omer Gottesman · Lihong Li · Zongqing Lu · Rupam Mahmood · Niranjani Prasad · Zhiwei (Tony) Qin · Csaba Szepesvari · Matthew Taylor -
2021 Poster: The Emergence of Individuality »
Jiechuan Jiang · Zongqing Lu -
2021 Poster: FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning »
Tianhao Zhang · yueheng li · Chen Wang · Guangming Xie · Zongqing Lu -
2021 Oral: The Emergence of Individuality »
Jiechuan Jiang · Zongqing Lu -
2021 Spotlight: FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning »
Tianhao Zhang · yueheng li · Chen Wang · Guangming Xie · Zongqing Lu