Timezone: »
In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations. We structurally enforce that the joint-action value is monotonic in the per-agent values, which allows tractable maximisation of the joint action-value in off-policy learning, and guarantees consistency between the centralised and decentralised policies. We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods.
Author Information
Tabish Rashid (University of Oxford)
Mikayel Samvelyan (Russian-Armenian University)
Christian Schroeder (University of Oxford)
Gregory Farquhar (University of Oxford)
Jakob Foerster (Facebook AI Research)
Shimon Whiteson (University of Oxford)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning »
Thu. Jul 12th 09:50 -- 10:00 AM Room A3
More from the Same Authors
-
2022 Workshop: AI for Agent-Based Modelling (AI4ABM) »
Christian Schroeder · Yang Zhang · Anisoara Calinescu · Dylan Radovic · Prateek Gupta · Jakob Foerster -
2022 : Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS »
Christian Schroeder · Yongchao Huang · Phil Torr · Martin Strohmeier -
2022 : Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS »
Christian Schroeder · Yongchao Huang · Phil Torr · Martin Strohmeier -
2022 Poster: Evolving Curricula with Regret-Based Environment Design »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2022 Spotlight: Evolving Curricula with Regret-Based Environment Design »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2022 Poster: Communicating via Markov Decision Processes »
Samuel Sokota · Christian Schroeder · Maximilian Igl · Luisa Zintgraf · Phil Torr · Martin Strohmeier · Zico Kolter · Shimon Whiteson · Jakob Foerster -
2022 Spotlight: Communicating via Markov Decision Processes »
Samuel Sokota · Christian Schroeder · Maximilian Igl · Luisa Zintgraf · Phil Torr · Martin Strohmeier · Zico Kolter · Shimon Whiteson · Jakob Foerster -
2022 Poster: Generalized Beliefs for Cooperative AI »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2022 Spotlight: Generalized Beliefs for Cooperative AI »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2021 Poster: Average-Reward Off-Policy Policy Evaluation with Function Approximation »
Shangtong Zhang · Yi Wan · Richard Sutton · Shimon Whiteson -
2021 Poster: Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning »
Luisa Zintgraf · Leo Feng · Cong Lu · Maximilian Igl · Kristian Hartikainen · Katja Hofmann · Shimon Whiteson -
2021 Spotlight: Average-Reward Off-Policy Policy Evaluation with Function Approximation »
Shangtong Zhang · Yi Wan · Richard Sutton · Shimon Whiteson -
2021 Spotlight: Breaking the Deadly Triad with a Target Network »
Shangtong Zhang · Hengshuai Yao · Shimon Whiteson -
2021 Spotlight: Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning »
Luisa Zintgraf · Leo Feng · Cong Lu · Maximilian Igl · Kristian Hartikainen · Katja Hofmann · Shimon Whiteson -
2021 Poster: Breaking the Deadly Triad with a Target Network »
Shangtong Zhang · Hengshuai Yao · Shimon Whiteson -
2021 Poster: Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning »
Shariq Iqbal · Christian Schroeder · Bei Peng · Wendelin Boehmer · Shimon Whiteson · Fei Sha -
2021 Oral: Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning »
Shariq Iqbal · Christian Schroeder · Bei Peng · Wendelin Boehmer · Shimon Whiteson · Fei Sha -
2021 Poster: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning »
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviychuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar -
2021 Poster: UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning »
Tarun Gupta · Anuj Mahajan · Bei Peng · Wendelin Boehmer · Shimon Whiteson -
2021 Spotlight: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning »
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviychuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar -
2021 Spotlight: UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning »
Tarun Gupta · Anuj Mahajan · Bei Peng · Wendelin Boehmer · Shimon Whiteson -
2020 Poster: Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation »
Shangtong Zhang · Bo Liu · Hengshuai Yao · Shimon Whiteson -
2020 Poster: Deep Coordination Graphs »
Wendelin Boehmer · Vitaly Kurin · Shimon Whiteson -
2020 Poster: GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values »
Shangtong Zhang · Bo Liu · Shimon Whiteson -
2019 : Poster Session #1 »
Adrien Ali Taiga · Aniket Anand Deshmukh · Tabish Rashid · Jonathan Binas · Nikolaus Yasui · Vitchyr Pong · Takahisa Imagawa · Jesse Clifton · Siddharth Mysore · Shi-Chun Tsai · Caleb Chuck · Giulia Vezzani · Hannes Bengt Eriksson -
2019 : "Ideas" mini-spotlights »
Kevin McCloskey · Nikola Milojevic-Dupont · Jonathan Binas · Christian Schroeder · Sasha Luccioni -
2019 : Networking Lunch (provided) + Poster Session »
Abraham Stanway · Alex Robson · Aneesh Rangnekar · Ashesh Chattopadhyay · Ashley Pilipiszyn · Benjamin LeRoy · Bolong Cheng · Ce Zhang · Chaopeng Shen · Christian Schroeder · Christian Clough · Clement DUHART · Clement Fung · Cozmin Ududec · Dali Wang · David Dao · di wu · Dimitrios Giannakis · Dino Sejdinovic · Doina Precup · Duncan Watson-Parris · Gege Wen · George Chen · Gopal Erinjippurath · Haifeng Li · Han Zou · Herke van Hoof · Hillary A Scannell · Hiroshi Mamitsuka · Hongbao Zhang · Jaegul Choo · James Wang · James Requeima · Jessica Hwang · Jinfan Xu · Johan Mathe · Jonathan Binas · Joonseok Lee · Kalai Ramea · Kate Duffy · Kevin McCloskey · Kris Sankaran · Lester Mackey · Letif Mones · Loubna Benabbou · Lynn Kaack · Matthew Hoffman · Mayur Mudigonda · Mehrdad Mahdavi · Michael McCourt · Mingchao Jiang · Mohammad Mahdi Kamani · Neel Guha · Niccolo Dalmasso · Nick Pawlowski · Nikola Milojevic-Dupont · Paulo Orenstein · Pedram Hassanzadeh · Pekka Marttinen · Ramesh Nair · Sadegh Farhang · Samuel Kaski · Sandeep Manjanna · Sasha Luccioni · Shuby Deshpande · Soo Kim · Soukayna Mouatadid · Sunghyun Park · Tao Lin · Telmo Felgueira · Thomas Hornigold · Tianle Yuan · Tom Beucler · Tracy Cui · Volodymyr Kuleshov · Wei Yu · yang song · Ydo Wexler · Yoshua Bengio · Zhecheng Wang · Zhuangfang Yi · Zouheir Malki -
2019 Poster: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Francis Song · Edward Hughes · Neil Burch · Iain Dunning · Shimon Whiteson · Matthew Botvinick · Michael Bowling -
2019 Oral: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Francis Song · Edward Hughes · Neil Burch · Iain Dunning · Shimon Whiteson · Matthew Botvinick · Michael Bowling -
2019 Poster: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2019 Poster: Fast Context Adaptation via Meta-Learning »
Luisa Zintgraf · Kyriacos Shiarlis · Vitaly Kurin · Katja Hofmann · Shimon Whiteson -
2019 Oral: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2019 Oral: Fast Context Adaptation via Meta-Learning »
Luisa Zintgraf · Kyriacos Shiarlis · Vitaly Kurin · Katja Hofmann · Shimon Whiteson -
2019 Poster: Fingerprint Policy Optimisation for Robust Reinforcement Learning »
Supratik Paul · Michael A Osborne · Shimon Whiteson -
2019 Oral: Fingerprint Policy Optimisation for Robust Reinforcement Learning »
Supratik Paul · Michael A Osborne · Shimon Whiteson -
2018 Poster: Fourier Policy Gradients »
Mattie Fellows · Kamil Ciosek · Shimon Whiteson -
2018 Oral: Fourier Policy Gradients »
Mattie Fellows · Kamil Ciosek · Shimon Whiteson -
2018 Poster: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2018 Poster: Deep Variational Reinforcement Learning for POMDPs »
Maximilian Igl · Luisa Zintgraf · Tuan Anh Le · Frank Wood · Shimon Whiteson -
2018 Oral: Deep Variational Reinforcement Learning for POMDPs »
Maximilian Igl · Luisa Zintgraf · Tuan Anh Le · Frank Wood · Shimon Whiteson -
2018 Oral: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2018 Poster: DiCE: The Infinitely Differentiable Monte Carlo Estimator »
Jakob Foerster · Gregory Farquhar · Maruan Al-Shedivat · Tim Rocktäschel · Eric Xing · Shimon Whiteson -
2018 Poster: TACO: Learning Task Decomposition via Temporal Alignment for Control »
Kyriacos Shiarlis · Markus Wulfmeier · Sasha Salter · Shimon Whiteson · Ingmar Posner -
2018 Oral: TACO: Learning Task Decomposition via Temporal Alignment for Control »
Kyriacos Shiarlis · Markus Wulfmeier · Sasha Salter · Shimon Whiteson · Ingmar Posner -
2018 Oral: DiCE: The Infinitely Differentiable Monte Carlo Estimator »
Jakob Foerster · Gregory Farquhar · Maruan Al-Shedivat · Tim Rocktäschel · Eric Xing · Shimon Whiteson -
2017 Poster: Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Nantas Nardelli · Gregory Farquhar · Triantafyllos Afouras · Phil Torr · Pushmeet Kohli · Shimon Whiteson -
2017 Talk: Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Nantas Nardelli · Gregory Farquhar · Triantafyllos Afouras · Phil Torr · Pushmeet Kohli · Shimon Whiteson -
2017 Poster: Input Switched Affine Networks: An RNN Architecture Designed for Interpretability »
Jakob Foerster · Justin Gilmer · Jan Chorowski · Jascha Sohl-Dickstein · David Sussillo -
2017 Talk: Input Switched Affine Networks: An RNN Architecture Designed for Interpretability »
Jakob Foerster · Justin Gilmer · Jan Chorowski · Jascha Sohl-Dickstein · David Sussillo