Timezone: »

Difference Advantage Estimation for Multi-Agent Policy Gradients
yueheng li · Guangming Xie · Zongqing Lu

Multi-agent policy gradient methods in centralized training with decentralized execution recently witnessed many progresses. During centralized training, multi-agent credit assignment is crucial, which can substantially promote learning performance. However, explicit multi-agent credit assignment in multi-agent policy gradient methods still receives less attention. In this paper, we investigate multi-agent credit assignment induced by reward shaping and provide a theoretical understanding in terms of its credit assignment and policy bias. Based on this, we propose an exponentially weighted advantage estimator, which is analogous to GAE, to enable multi-agent credit assignment while allowing the tradeoff with policy bias. Empirical results show that our approach can successfully perform effective multi-agent credit assignment, and thus substantially outperforms other advantage estimators.

Author Information

yueheng li (Peking university)
Guangming Xie (1. State Key Laboratory for Turbulence and Complex Systems, College of Engineering, Peking University; 2. Center for Multi-Agent Research, Institute for Artificial Intelligence, Peking University)
Zongqing Lu (Peking University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors