Timezone: »
Multi-agent policy gradient methods in centralized training with decentralized execution recently witnessed many progresses. During centralized training, multi-agent credit assignment is crucial, which can substantially promote learning performance. However, explicit multi-agent credit assignment in multi-agent policy gradient methods still receives less attention. In this paper, we investigate multi-agent credit assignment induced by reward shaping and provide a theoretical understanding in terms of its credit assignment and policy bias. Based on this, we propose an exponentially weighted advantage estimator, which is analogous to GAE, to enable multi-agent credit assignment while allowing the tradeoff with policy bias. Empirical results show that our approach can successfully perform effective multi-agent credit assignment, and thus substantially outperforms other advantage estimators.
Author Information
yueheng li (Peking university)
Guangming Xie (1. State Key Laboratory for Turbulence and Complex Systems, College of Engineering, Peking University; 2. Center for Multi-Agent Research, Institute for Artificial Intelligence, Peking University)
Zongqing Lu (Peking University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Difference Advantage Estimation for Multi-Agent Policy Gradients »
Thu. Jul 21st through Fri the 22nd Room Hall E #803
More from the Same Authors
-
2022 Poster: Divergence-Regularized Multi-Agent Actor-Critic »
Kefan Su · Zongqing Lu -
2022 Poster: Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning »
Haoqi Yuan · Zongqing Lu -
2022 Spotlight: Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning »
Haoqi Yuan · Zongqing Lu -
2022 Spotlight: Divergence-Regularized Multi-Agent Actor-Critic »
Kefan Su · Zongqing Lu -
2021 : RL + Operations Research Panel »
Jim Dai · Fei Fang · Shie Mannor · Yuandong Tian · Zhiwei (Tony) Qin · Zongqing Lu -
2021 Workshop: Reinforcement Learning for Real Life »
Yuxi Li · Minmin Chen · Omer Gottesman · Lihong Li · Zongqing Lu · Rupam Mahmood · Niranjani Prasad · Zhiwei (Tony) Qin · Csaba Szepesvari · Matthew Taylor -
2021 Poster: The Emergence of Individuality »
Jiechuan Jiang · Zongqing Lu -
2021 Poster: FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning »
Tianhao Zhang · yueheng li · Chen Wang · Guangming Xie · Zongqing Lu -
2021 Oral: The Emergence of Individuality »
Jiechuan Jiang · Zongqing Lu -
2021 Spotlight: FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning »
Tianhao Zhang · yueheng li · Chen Wang · Guangming Xie · Zongqing Lu