Timezone: »
Centralized training with decentralized execution has become an important paradigm in multi-agent learning. Though practical, current methods rely on restrictive assumptions to decompose the centralized value function across agents for execution. In this paper, we eliminate this restriction by proposing multi-agent determinantal Q-learning. Our method is established on Q-DPP, a novel extension of determinantal point process (DPP) to multi-agent setting. Q-DPP promotes agents to acquire diverse behavioral models; this allows a natural factorization of the joint Q-functions with no need for \emph{a priori} structural constraints on the value function or special network architectures. We demonstrate that Q-DPP generalizes major solutions including VDN, QMIX, and QTRAN on decentralizable cooperative tasks. To efficiently draw samples from Q-DPP, we develop a linear-time sampler with theoretical approximation guarantee. Our sampler also benefits exploration by coordinating agents to cover orthogonal directions in the state space during training. We evaluate our algorithm on multiple cooperative benchmarks; its effectiveness has been demonstrated when compared with the state-of-the-art.
Author Information
Yaodong Yang (Huawei UK)
Ying Wen (UCL)
Jun Wang (UCL)
Liheng Chen (Shanghai Jiao Tong University)
Kun Shao (Huawei Noah's Ark Lab)
David Mguni (Noah's Ark Laboratory, Huawei)
Weinan Zhang (Shanghai Jiao Tong University)
More from the Same Authors
-
2023 Poster: GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models »
Hanjing Wang · Man-Kit Sit · Congjie He · Ying Wen · Weinan Zhang · Jun Wang · Yaodong Yang · Luo Mai -
2023 Poster: MANSA: Learning Fast and Slow in Multi-Agent Systems »
David Mguni · Haojun Chen · Taher Jafferjee · Jianhong Wang · Longfei Yue · Xidong Feng · Stephen Mcaleer · Feifei Tong · Jun Wang · Yaodong Yang -
2023 Poster: Regret-Minimizing Double Oracle for Extensive-Form Games »
Xiaohang Tang · Le Cong Dinh · Stephen Mcaleer · Yaodong Yang -
2023 Poster: A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems »
Oliver Slumbers · David Mguni · Stefano Blumberg · Stephen Mcaleer · Yaodong Yang · Jun Wang -
2022 Poster: Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach »
Shuang Wu · Ling Shi · Jun Wang · Guangjian Tian -
2022 Poster: Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization »
Minghuan Liu · Zhengbang Zhu · Yuzheng Zhuang · Weinan Zhang · Jianye Hao · Yong Yu · Jun Wang -
2022 Spotlight: Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach »
Shuang Wu · Ling Shi · Jun Wang · Guangjian Tian -
2022 Spotlight: Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization »
Minghuan Liu · Zhengbang Zhu · Yuzheng Zhuang · Weinan Zhang · Jianye Hao · Yong Yu · Jun Wang -
2022 Poster: Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation »
Aivar Sootla · Alexander I Cowen-Rivers · Taher Jafferjee · Ziyan Wang · David Mguni · Jun Wang · Haitham Bou Ammar -
2022 Spotlight: Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation »
Aivar Sootla · Alexander I Cowen-Rivers · Taher Jafferjee · Ziyan Wang · David Mguni · Jun Wang · Haitham Bou Ammar -
2021 Poster: Learning in Nonzero-Sum Stochastic Games with Potentials »
David Mguni · Yutong Wu · Yali Du · Yaodong Yang · Ziyi Wang · Minne Li · Ying Wen · Joel Jennings · Jun Wang -
2021 Poster: Modelling Behavioural Diversity for Learning in Open-Ended Games »
Nicolas Perez-Nieves · Yaodong Yang · Oliver Slumbers · David Mguni · Ying Wen · Jun Wang -
2021 Poster: Estimating $\alpha$-Rank from A Few Entries with Low Rank Matrix Completion »
Yali Du · Xue Yan · Xu Chen · Jun Wang · Haifeng Zhang -
2021 Spotlight: Learning in Nonzero-Sum Stochastic Games with Potentials »
David Mguni · Yutong Wu · Yali Du · Yaodong Yang · Ziyi Wang · Minne Li · Ying Wen · Joel Jennings · Jun Wang -
2021 Oral: Modelling Behavioural Diversity for Learning in Open-Ended Games »
Nicolas Perez-Nieves · Yaodong Yang · Oliver Slumbers · David Mguni · Ying Wen · Jun Wang -
2021 Spotlight: Estimating $\alpha$-Rank from A Few Entries with Low Rank Matrix Completion »
Yali Du · Xue Yan · Xu Chen · Jun Wang · Haifeng Zhang -
2020 Poster: Bidirectional Model-based Policy Optimization »
Hang Lai · Jian Shen · Weinan Zhang · Yong Yu -
2019 Poster: Lipschitz Generative Adversarial Nets »
Zhiming Zhou · Jiadong Liang · Yuxuan Song · Lantao Yu · Hongwei Wang · Weinan Zhang · Yong Yu · Zhihua Zhang -
2019 Poster: BayesNAS: A Bayesian Approach for Neural Architecture Search »
Hongpeng Zhou · Minghao Yang · Jun Wang · Wei Pan -
2019 Oral: BayesNAS: A Bayesian Approach for Neural Architecture Search »
Hongpeng Zhou · Minghao Yang · Jun Wang · Wei Pan -
2019 Oral: Lipschitz Generative Adversarial Nets »
Zhiming Zhou · Jiadong Liang · Yuxuan Song · Lantao Yu · Hongwei Wang · Weinan Zhang · Yong Yu · Zhihua Zhang -
2018 Poster: Path-Level Network Transformation for Efficient Architecture Search »
Han Cai · Jiacheng Yang · Weinan Zhang · Song Han · Yong Yu -
2018 Poster: Mean Field Multi-Agent Reinforcement Learning »
Yaodong Yang · Rui Luo · Minne Li · Ming Zhou · Weinan Zhang · Jun Wang -
2018 Oral: Mean Field Multi-Agent Reinforcement Learning »
Yaodong Yang · Rui Luo · Minne Li · Ming Zhou · Weinan Zhang · Jun Wang -
2018 Oral: Path-Level Network Transformation for Efficient Architecture Search »
Han Cai · Jiacheng Yang · Weinan Zhang · Song Han · Yong Yu