Timezone: »
Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.
Author Information
Jakob Foerster (University of Oxford)
Nantas Nardelli (University of Oxford)
Gregory Farquhar (University of Oxford)
Triantafyllos Afouras (University of Oxford)
Phil Torr (Oxford)
Pushmeet Kohli (Microsoft Research)
Shimon Whiteson (University of Oxford)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Talk: Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning »
Wed. Aug 9th 04:06 -- 04:24 AM Room C4.5
More from the Same Authors
-
2021 : Combating Adversaries with Anti-Adversaries »
Motasem Alfarra · Juan C Perez · Ali Thabet · Adel Bibi · Phil Torr · Bernard Ghanem -
2021 : Detecting and Quantifying Malicious Activity with Simulation-based Inference »
Andrew Gambardella · Naeemullah Khan · Phil Torr · Atilim Gunes Baydin -
2022 : Make Some Noise: Reliable and Efficient Single-Step Adversarial Training »
Pau de Jorge Aranda · Adel Bibi · Riccardo Volpi · Amartya Sanyal · Phil Torr · Gregory Rogez · Puneet Dokania -
2022 : Catastrophic overfitting is a bug but also a feature »
Guillermo Ortiz Jimenez · Pau de Jorge Aranda · Amartya Sanyal · Adel Bibi · Puneet Dokania · Pascal Frossard · Gregory Rogez · Phil Torr -
2022 : Illusionary Attacks on Sequential Decision Makers and Countermeasures »
Tim Franzmeyer · Joao Henriques · Jakob Foerster · Phil Torr · Adel Bibi · Christian Schroeder -
2022 : How robust are pre-trained models to distribution shift? »
Yuge Shi · Imant Daunhawer · Julia Vogt · Phil Torr · Amartya Sanyal -
2022 : How robust are pre-trained models to distribution shift? »
Yuge Shi · Imant Daunhawer · Julia Vogt · Phil Torr · Amartya Sanyal -
2023 : Illusory Attacks: Detectability Matters in Adversarial Attacks on Sequential Decision-Makers »
Tim Franzmeyer · Stephen Mcaleer · Joao Henriques · Jakob Foerster · Phil Torr · Adel Bibi · Christian Schroeder -
2023 : Certified Calibration: Bounding Worst-Case Calibration under Adversarial Attacks »
Cornelius Emde · Francesco Pinto · Thomas Lukasiewicz · Phil Torr · Adel Bibi -
2023 : Certifying Ensembles: A General Certification Theory with S-Lipschitzness »
Aleksandar Petrov · Francisco Eiras · Amartya Sanyal · Phil Torr · Adel Bibi -
2023 : Language Model Tokenizers Introduce Unfairness Between Languages »
Aleksandar Petrov · Emanuele La Malfa · Phil Torr · Adel Bibi -
2023 : Who to imitate: Imitating desired behavior from diverse multi-agent datasets »
Tim Franzmeyer · Jakob Foerster · Edith Elkind · Phil Torr · Joao Henriques -
2023 : Provably Correct Physics-Informed Neural Networks »
Francisco Girbal Eiras · Adel Bibi · Rudy Bunel · Krishnamurthy Dvijotham · Phil Torr · M. Pawan Kumar -
2023 Poster: Graph Inductive Biases in Transformers without Message Passing »
Liheng Ma · Chen Lin · Derek Lim · Adriana Romero Soriano · Puneet Dokania · Mark Coates · Phil Torr · Ser Nam Lim -
2023 Poster: Certifying Ensembles: A General Certification Theory with S-Lipschitzness »
Aleksandar Petrov · Francisco Eiras · Amartya Sanyal · Phil Torr · Adel Bibi -
2022 : Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS »
Christian Schroeder · Yongchao Huang · Phil Torr · Martin Strohmeier -
2022 : Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS »
Christian Schroeder · Yongchao Huang · Phil Torr · Martin Strohmeier -
2022 Poster: Adversarial Masking for Self-Supervised Learning »
Yuge Shi · Siddharth N · Phil Torr · Adam Kosiorek -
2022 Spotlight: Adversarial Masking for Self-Supervised Learning »
Yuge Shi · Siddharth N · Phil Torr · Adam Kosiorek -
2022 Poster: Communicating via Markov Decision Processes »
Samuel Sokota · Christian Schroeder · Maximilian Igl · Luisa Zintgraf · Phil Torr · Martin Strohmeier · Zico Kolter · Shimon Whiteson · Jakob Foerster -
2022 Spotlight: Communicating via Markov Decision Processes »
Samuel Sokota · Christian Schroeder · Maximilian Igl · Luisa Zintgraf · Phil Torr · Martin Strohmeier · Zico Kolter · Shimon Whiteson · Jakob Foerster -
2022 Poster: Generalized Beliefs for Cooperative AI »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2022 Spotlight: Generalized Beliefs for Cooperative AI »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2021 Poster: Average-Reward Off-Policy Policy Evaluation with Function Approximation »
Shangtong Zhang · Yi Wan · Richard Sutton · Shimon Whiteson -
2021 Poster: Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning »
Luisa Zintgraf · Leo Feng · Cong Lu · Maximilian Igl · Kristian Hartikainen · Katja Hofmann · Shimon Whiteson -
2021 Spotlight: Average-Reward Off-Policy Policy Evaluation with Function Approximation »
Shangtong Zhang · Yi Wan · Richard Sutton · Shimon Whiteson -
2021 Spotlight: Breaking the Deadly Triad with a Target Network »
Shangtong Zhang · Hengshuai Yao · Shimon Whiteson -
2021 Spotlight: Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning »
Luisa Zintgraf · Leo Feng · Cong Lu · Maximilian Igl · Kristian Hartikainen · Katja Hofmann · Shimon Whiteson -
2021 Poster: Breaking the Deadly Triad with a Target Network »
Shangtong Zhang · Hengshuai Yao · Shimon Whiteson -
2021 Poster: Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning »
Shariq Iqbal · Christian Schroeder · Bei Peng · Wendelin Boehmer · Shimon Whiteson · Fei Sha -
2021 Oral: Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning »
Shariq Iqbal · Christian Schroeder · Bei Peng · Wendelin Boehmer · Shimon Whiteson · Fei Sha -
2021 Poster: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning »
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviychuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar -
2021 Poster: UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning »
Tarun Gupta · Anuj Mahajan · Bei Peng · Wendelin Boehmer · Shimon Whiteson -
2021 Spotlight: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning »
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviychuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar -
2021 Spotlight: UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning »
Tarun Gupta · Anuj Mahajan · Bei Peng · Wendelin Boehmer · Shimon Whiteson -
2020 : Invited Talk: Alison Gopnik »
Nantas Nardelli -
2020 : Poster session 2 »
Nantas Nardelli -
2020 : Poster session 1 »
Nantas Nardelli -
2020 : Invited Talk: Arthur Szlam »
Nantas Nardelli -
2020 Workshop: 1st Workshop on Language in Reinforcement Learning (LaReL) »
Nantas Nardelli · Jelena Luketina · Nantas Nardelli · Jakob Foerster · Victor Zhong · Jacob Andreas · Tim Rocktäschel · Edward Grefenstette · Tim Rocktäschel -
2020 Poster: Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation »
Shangtong Zhang · Bo Liu · Hengshuai Yao · Shimon Whiteson -
2020 Poster: Deep Coordination Graphs »
Wendelin Boehmer · Vitaly Kurin · Shimon Whiteson -
2020 Poster: GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values »
Shangtong Zhang · Bo Liu · Shimon Whiteson -
2019 Poster: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Francis Song · Edward Hughes · Neil Burch · Iain Dunning · Shimon Whiteson · Matthew Botvinick · Michael Bowling -
2019 Oral: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Francis Song · Edward Hughes · Neil Burch · Iain Dunning · Shimon Whiteson · Matthew Botvinick · Michael Bowling -
2019 Poster: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2019 Poster: Fast Context Adaptation via Meta-Learning »
Luisa Zintgraf · Kyriacos Shiarlis · Vitaly Kurin · Katja Hofmann · Shimon Whiteson -
2019 Oral: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2019 Oral: Fast Context Adaptation via Meta-Learning »
Luisa Zintgraf · Kyriacos Shiarlis · Vitaly Kurin · Katja Hofmann · Shimon Whiteson -
2019 Poster: Fingerprint Policy Optimisation for Robust Reinforcement Learning »
Supratik Paul · Michael A Osborne · Shimon Whiteson -
2019 Oral: Fingerprint Policy Optimisation for Robust Reinforcement Learning »
Supratik Paul · Michael A Osborne · Shimon Whiteson -
2018 Poster: Fourier Policy Gradients »
Mattie Fellows · Kamil Ciosek · Shimon Whiteson -
2018 Oral: Fourier Policy Gradients »
Mattie Fellows · Kamil Ciosek · Shimon Whiteson -
2018 Poster: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2018 Poster: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning »
Tabish Rashid · Mikayel Samvelyan · Christian Schroeder · Gregory Farquhar · Jakob Foerster · Shimon Whiteson -
2018 Poster: Deep Variational Reinforcement Learning for POMDPs »
Maximilian Igl · Luisa Zintgraf · Tuan Anh Le · Frank Wood · Shimon Whiteson -
2018 Oral: Deep Variational Reinforcement Learning for POMDPs »
Maximilian Igl · Luisa Zintgraf · Tuan Anh Le · Frank Wood · Shimon Whiteson -
2018 Oral: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2018 Oral: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning »
Tabish Rashid · Mikayel Samvelyan · Christian Schroeder · Gregory Farquhar · Jakob Foerster · Shimon Whiteson -
2018 Poster: DiCE: The Infinitely Differentiable Monte Carlo Estimator »
Jakob Foerster · Gregory Farquhar · Maruan Al-Shedivat · Tim Rocktäschel · Eric Xing · Shimon Whiteson -
2018 Poster: TACO: Learning Task Decomposition via Temporal Alignment for Control »
Kyriacos Shiarlis · Markus Wulfmeier · Sasha Salter · Shimon Whiteson · Ingmar Posner -
2018 Oral: TACO: Learning Task Decomposition via Temporal Alignment for Control »
Kyriacos Shiarlis · Markus Wulfmeier · Sasha Salter · Shimon Whiteson · Ingmar Posner -
2018 Oral: DiCE: The Infinitely Differentiable Monte Carlo Estimator »
Jakob Foerster · Gregory Farquhar · Maruan Al-Shedivat · Tim Rocktäschel · Eric Xing · Shimon Whiteson -
2017 Poster: Learning Continuous Semantic Representations of Symbolic Expressions »
Miltiadis Allamanis · pankajan Chanthirasegaran · Pushmeet Kohli · Charles Sutton -
2017 Poster: Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning »
Junhyuk Oh · Satinder Singh · Honglak Lee · Pushmeet Kohli -
2017 Talk: Learning Continuous Semantic Representations of Symbolic Expressions »
Miltiadis Allamanis · pankajan Chanthirasegaran · Pushmeet Kohli · Charles Sutton -
2017 Talk: Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning »
Junhyuk Oh · Satinder Singh · Honglak Lee · Pushmeet Kohli -
2017 Poster: RobustFill: Neural Program Learning under Noisy I/O »
Jacob Devlin · Jonathan Uesato · Surya Bhupatiraju · Rishabh Singh · Abdelrahman Mohammad · Pushmeet Kohli -
2017 Poster: Input Switched Affine Networks: An RNN Architecture Designed for Interpretability »
Jakob Foerster · Justin Gilmer · Jan Chorowski · Jascha Sohl-Dickstein · David Sussillo -
2017 Talk: RobustFill: Neural Program Learning under Noisy I/O »
Jacob Devlin · Jonathan Uesato · Surya Bhupatiraju · Rishabh Singh · Abdelrahman Mohammad · Pushmeet Kohli -
2017 Talk: Input Switched Affine Networks: An RNN Architecture Designed for Interpretability »
Jakob Foerster · Justin Gilmer · Jan Chorowski · Jascha Sohl-Dickstein · David Sussillo -
2017 Poster: Batched High-dimensional Bayesian Optimization via Structural Kernel Learning »
Zi Wang · Chengtao Li · Stefanie Jegelka · Pushmeet Kohli -
2017 Talk: Batched High-dimensional Bayesian Optimization via Structural Kernel Learning »
Zi Wang · Chengtao Li · Stefanie Jegelka · Pushmeet Kohli