Timezone: »
Recently, deep multiagent reinforcement learning (MARL) has become a highly active research area as many real-world problems can be inherently viewed as multiagent systems. A particularly interesting and widely applicable class of problems is the partially observable cooperative multiagent setting, in which a team of agents learns to coordinate their behaviors conditioning on their private observations and commonly shared global reward signals. One natural solution is to resort to the centralized training and decentralized execution paradigm and during centralized training, one key challenge is the multiagent credit assignment: how to allocate the global rewards for individual agent policies for better coordination towards maximizing system-level's benefits. In this paper, we propose a new method called Q-value Path Decomposition (QPD) to decompose the system's global Q-values into individual agents' Q-values. Unlike previous works which restrict the representation relation of the individual Q-values and the global one, we leverage the integrated gradient attribution technique into deep MARL to directly decompose global Q-values along trajectory paths to assign credits for agents. We evaluate QPD on the challenging StarCraft II micromanagement tasks and show that QPD achieves the state-of-the-art performance in both homogeneous and heterogeneous multiagent scenarios compared with existing cooperative MARL algorithms.
Author Information
Yaodong Yang (Tianjin University)
Jianye Hao (Tianjin University)
Guangyong Chen (Tencent)
Hongyao Tang (Tianjin University)
Yingfeng Chen (NetEase Fuxi AI Lab)
Yujing Hu (NetEase Fuxi AI Lab)
Changjie Fan (Netease)
Zhongyu Wei (Fudan University)
More from the Same Authors
-
2021 : Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2023 : Boosting Off-policy RL with Policy Representation and Policy-extended Value Function Approximator »
Min Zhang · Jianye Hao · Hongyao Tang · Yan Zheng -
2023 : A Policy-Decoupled Method for High-Quality Data Augmentation in Offline Reinforcement Learning »
Shixi Lian · Yi Ma · Jinyi Liu · Jianye Hao · Yan Zheng · Zhaopeng Meng -
2023 : Improving Offline-to-Online Reinforcement Learning with Q-Ensembles »
Kai Zhao · Yi Ma · Jinyi Liu · Jianye Hao · Yan Zheng · Zhaopeng Meng -
2023 : Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning »
Jinyi Liu · Yi Ma · Jianye Hao · Yujing Hu · Yan Zheng · Tangjie Lv · Changjie Fan -
2023 Poster: RACE: Improve Multi-Agent Reinforcement Learning with Representation Asymmetry and Collaborative Evolution »
Pengyi Li · Jianye Hao · Hongyao Tang · Yan Zheng · Xian Fu -
2023 Poster: MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL »
Fei Ni · Jianye Hao · Yao Mu · Yifu Yuan · Yan Zheng · Bin Wang · Zhixuan Liang -
2023 Poster: ChiPFormer: Transferable Chip Placement via Offline Decision Transformer »
Yao LAI · Jinxin Liu · Zhentao Tang · Bin Wang · Jianye Hao · Ping Luo -
2022 Poster: Individual Reward Assisted Multi-Agent Reinforcement Learning »
Li Wang · Yupeng Zhang · Yujing Hu · Weixun Wang · Chongjie Zhang · Yang Gao · Jianye Hao · Tangjie Lv · Changjie Fan -
2022 Poster: PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration »
Pengyi Li · Hongyao Tang · Tianpei Yang · Xiaotian Hao · Tong Sang · Yan Zheng · Jianye Hao · Matthew Taylor · Wenyuan Tao · Zhen Wang -
2022 Spotlight: Individual Reward Assisted Multi-Agent Reinforcement Learning »
Li Wang · Yupeng Zhang · Yujing Hu · Weixun Wang · Chongjie Zhang · Yang Gao · Jianye Hao · Tangjie Lv · Changjie Fan -
2022 Spotlight: PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration »
Pengyi Li · Hongyao Tang · Tianpei Yang · Xiaotian Hao · Tong Sang · Yan Zheng · Jianye Hao · Matthew Taylor · Wenyuan Tao · Zhen Wang -
2021 Poster: MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration »
Jin Zhang · Jianhao Wang · Hao Hu · Tong Chen · Yingfeng Chen · Changjie Fan · Chongjie Zhang -
2021 Spotlight: MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration »
Jin Zhang · Jianhao Wang · Hao Hu · Tong Chen · Yingfeng Chen · Changjie Fan · Chongjie Zhang -
2021 Poster: Principled Exploration via Optimistic Bootstrapping and Backward Induction »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 Spotlight: Principled Exploration via Optimistic Bootstrapping and Backward Induction »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2020 Poster: Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising »
Xiaotian Hao · Zhaoqing Peng · Yi Ma · Guan Wang · Junqi Jin · Jianye Hao · Shan Chen · Rongquan Bai · Mingzhou Xie · Miao Xu · Zhenzhe Zheng · Chuan Yu · HAN LI · Jian Xu · Kun Gai