Timezone: »
An unaddressed challenge in multi-agent coordination is to enable AI agents to exploit the semantic relationships between the features of actions and the features of observations. Humans take advantage of these relationships in highly intuitive ways. For instance, in the absence of a shared language, we might point to the object we desire or hold up our fingers to indicate how many objects we want. To address this challenge, we investigate the effect of network architecture on the propensity of learning algorithms to exploit these semantic relationships. Across a procedurally generated coordination task, we find that attention-based architectures that jointly process a featurized representation of observations and actions have a better inductive bias for learning intuitive policies. Through fine-grained evaluation and scenario analysis, we show that the resulting policies are human-interpretable. Moreover, such agents coordinate with people without training on any human data.
Author Information
Mingwei Ma (University of Chicago)
Mingwei is a Ph.D. and MBA candidate at Chicago Booth generously funded by a Booth PhD Fellowship. His research interests include deep reinforcement learning, financial asset pricing, and high frequency data. His work has broad application in the systematic investment and algorithmic trading industry. Prior to PhD, Mingwei received a BA in Physics and Philosophy and a MSc in Mathematical Physics from the University of Oxford, where he specialized in computational and mathematical physics as well as large-scale data analysis.
Jizhou Liu (University of Chicago)
Samuel Sokota (Carnegie Mellon University)
Max Kleiman-Weiner
Jakob Foerster (Oxford university)
Jakob Foerster started as an Associate Professor at the department of engineering science at the University of Oxford in the fall of 2021. During his PhD at Oxford he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. After his PhD he worked as a research scientist at Facebook AI Research in California, where he continued doing foundational work. He was the lead organizer of the first Emergent Communication workshop at NeurIPS in 2017, which he has helped organize ever since and was awarded a prestigious CIFAR AI chair in 2019. His past work addresses how AI agents can learn to cooperate and communicate with other agents, most recently he has been developing and addressing the zero-shot coordination problem setting, a crucial step towards human-AI coordination.
More from the Same Authors
-
2022 : Adversarial Cheap Talk »
Christopher Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 : Illusionary Attacks on Sequential Decision Makers and Countermeasures »
Tim Franzmeyer · Joao Henriques · Jakob Foerster · Phil Torr · Adel Bibi · Christian Schroeder -
2022 : Discovered Policy Optimisation »
Christopher Lu · Jakub Grudzien Kuba · Alistair Letcher · Luke Metz · Christian Schroeder · Jakob Foerster -
2022 : Adversarial Cheap Talk »
Christopher Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 : Adversarial Cheap Talk »
Christopher Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 : Adversarial Cheap Talk »
Christopher Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2023 : Illusory Attacks: Detectability Matters in Adversarial Attacks on Sequential Decision-Makers »
Tim Franzmeyer · Stephen Mcaleer · Joao Henriques · Jakob Foerster · Phil Torr · Adel Bibi · Christian Schroeder -
2023 : Analyzing the Sample Complexity of Model-Free Opponent Shaping »
Kitty Fung · Qizhen Zhang · Christopher Lu · Timon Willi · Jakob Foerster -
2023 : Structured State Space Models for In-Context Reinforcement Learning »
Christopher Lu · Yannick Schroecker · Albert Gu · Emilio Parisotto · Jakob Foerster · Satinder Singh · Feryal Behbahani -
2023 : Who to imitate: Imitating desired behavior from diverse multi-agent datasets »
Tim Franzmeyer · Jakob Foerster · Edith Elkind · Phil Torr · Joao Henriques -
2023 Poster: Abstracting Imperfect Information Away from Two-Player Zero-Sum Games »
Samuel Sokota · Ryan D'Orazio · Chun Kai Ling · David Wu · Zico Kolter · Noam Brown -
2023 Poster: Adversarial Cheap Talk »
Christopher Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 : Adversarial Cheap Talk »
Christopher Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 Poster: Evolving Curricula with Regret-Based Environment Design »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2022 Poster: COLA: Consistent Learning with Opponent-Learning Awareness »
Timon Willi · Alistair Letcher · Johannes Treutlein · Jakob Foerster -
2022 Spotlight: Evolving Curricula with Regret-Based Environment Design »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2022 Spotlight: COLA: Consistent Learning with Opponent-Learning Awareness »
Timon Willi · Alistair Letcher · Johannes Treutlein · Jakob Foerster -
2022 Poster: Communicating via Markov Decision Processes »
Samuel Sokota · Christian Schroeder · Maximilian Igl · Luisa Zintgraf · Phil Torr · Martin Strohmeier · Zico Kolter · Shimon Whiteson · Jakob Foerster -
2022 Spotlight: Communicating via Markov Decision Processes »
Samuel Sokota · Christian Schroeder · Maximilian Igl · Luisa Zintgraf · Phil Torr · Martin Strohmeier · Zico Kolter · Shimon Whiteson · Jakob Foerster -
2022 Poster: Model-Free Opponent Shaping »
Christopher Lu · Timon Willi · Christian Schroeder de Witt · Jakob Foerster -
2022 Poster: Mirror Learning: A Unifying Framework of Policy Optimisation »
Jakub Grudzien Kuba · Christian Schroeder de Witt · Jakob Foerster -
2022 Poster: Generalized Beliefs for Cooperative AI »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2022 Spotlight: Generalized Beliefs for Cooperative AI »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2022 Spotlight: Model-Free Opponent Shaping »
Christopher Lu · Timon Willi · Christian Schroeder de Witt · Jakob Foerster -
2022 Spotlight: Mirror Learning: A Unifying Framework of Policy Optimisation »
Jakub Grudzien Kuba · Christian Schroeder de Witt · Jakob Foerster -
2021 Poster: Off-Belief Learning »
Hengyuan Hu · Adam Lerer · Brandon Cui · Luis Pineda · Noam Brown · Jakob Foerster -
2021 Spotlight: Off-Belief Learning »
Hengyuan Hu · Adam Lerer · Brandon Cui · Luis Pineda · Noam Brown · Jakob Foerster -
2021 Poster: Trajectory Diversity for Zero-Shot Coordination »
Andrei Lupu · Brandon Cui · Hengyuan Hu · Jakob Foerster -
2021 Spotlight: Trajectory Diversity for Zero-Shot Coordination »
Andrei Lupu · Brandon Cui · Hengyuan Hu · Jakob Foerster -
2021 Poster: A New Formalism, Method and Open Issues for Zero-Shot Coordination »
Johannes Treutlein · Michael Dennis · Caspar Oesterheld · Jakob Foerster -
2021 Spotlight: A New Formalism, Method and Open Issues for Zero-Shot Coordination »
Johannes Treutlein · Michael Dennis · Caspar Oesterheld · Jakob Foerster -
2020 Poster: “Other-Play” for Zero-Shot Coordination »
Hengyuan Hu · Alexander Peysakhovich · Adam Lerer · Jakob Foerster -
2019 Poster: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Francis Song · Edward Hughes · Neil Burch · Iain Dunning · Shimon Whiteson · Matthew Botvinick · Michael Bowling -
2019 Oral: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Francis Song · Edward Hughes · Neil Burch · Iain Dunning · Shimon Whiteson · Matthew Botvinick · Michael Bowling -
2019 Poster: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2019 Oral: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2018 Poster: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2018 Poster: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning »
Tabish Rashid · Mikayel Samvelyan · Christian Schroeder · Gregory Farquhar · Jakob Foerster · Shimon Whiteson -
2018 Oral: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2018 Oral: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning »
Tabish Rashid · Mikayel Samvelyan · Christian Schroeder · Gregory Farquhar · Jakob Foerster · Shimon Whiteson -
2018 Poster: DiCE: The Infinitely Differentiable Monte Carlo Estimator »
Jakob Foerster · Gregory Farquhar · Maruan Al-Shedivat · Tim Rocktäschel · Eric Xing · Shimon Whiteson -
2018 Oral: DiCE: The Infinitely Differentiable Monte Carlo Estimator »
Jakob Foerster · Gregory Farquhar · Maruan Al-Shedivat · Tim Rocktäschel · Eric Xing · Shimon Whiteson -
2017 Poster: Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Nantas Nardelli · Gregory Farquhar · Triantafyllos Afouras · Phil Torr · Pushmeet Kohli · Shimon Whiteson -
2017 Talk: Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Nantas Nardelli · Gregory Farquhar · Triantafyllos Afouras · Phil Torr · Pushmeet Kohli · Shimon Whiteson -
2017 Poster: Input Switched Affine Networks: An RNN Architecture Designed for Interpretability »
Jakob Foerster · Justin Gilmer · Jan Chorowski · Jascha Sohl-Dickstein · David Sussillo -
2017 Talk: Input Switched Affine Networks: An RNN Architecture Designed for Interpretability »
Jakob Foerster · Justin Gilmer · Jan Chorowski · Jascha Sohl-Dickstein · David Sussillo