We consider the problem of learning fair policies in (deep) cooperative multi-agent reinforcement learning (MARL). We formalize it in a principled way as the problem of optimizing a welfare function that explicitly encodes two important aspects of fairness: efficiency and equity. We provide a theoretical analysis of the convergence of policy gradient for this problem. As a solution method, we propose a novel neural network architecture, which is composed of two sub-networks specifically designed for taking into account these two aspects of fairness. In experiments, we demonstrate the importance of the two sub-networks for fair optimization. Our overall approach is general as it can accommodate any (sub)differentiable welfare function. Therefore, it is compatible with various notions of fairness that have been proposed in the literature (e.g., lexicographic maximin, generalized Gini social welfare function, proportional fairness). Our method is generic and can be implemented in various MARL settings: centralized training and decentralized execution, or fully decentralized. Finally, we experimentally validate our approach in various domains and show that it can perform much better than previous methods, both in terms of efficiency and equity.
Matthieu Zimmer (Shanghai Jiao Tong University)
[Actively looking for a research scientist position.] Matthieu Zimmer received the Ph.D. degree in computer science in 2018 from the University of Lorraine and the M.S. degree in computer science from the University Pierre and Marie Curie in 2014. Since 2018, he is a postdoctoral researcher at the joint institute of the University of Michigan and the Shanghai Jiao Tong University in China. His current research interests include deep reinforcement learning, transfer learning, multi-agent systems and meta learning.
Claire Glanois (Shanghai Jiao Tong University)
Umer Siddique (Shanghai Jiao Tong University)
Paul Weng (Shanghai Jiao Tong University)
Related Events (a corresponding poster, oral, or spotlight)
2021 Poster: Learning Fair Policies in Decentralized Cooperative Multi-Agent Reinforcement Learning »
Wed Jul 21st 04:00 -- 06:00 AM Room Virtual
More from the Same Authors
2020 Poster: Learning Fair Policies in Multi-Objective (Deep) Reinforcement Learning with Average and Discounted Rewards »
Umer Siddique · Paul Weng · Matthieu Zimmer