Timezone: »
We introduce and analyze a class of algorithms, called Mirror Ascent against an Improved Opponent (MAIO), for computing Nash equilibria in two-player zero-sum games, both in normal form and in sequential form with imperfect information. These algorithms update the policy of each player with a mirror-ascent step to maximize the value of playing against an improved opponent. An improved opponent can be a best response, a greedy policy, a policy improved by policy gradient, or by any other reinforcement learning or search techniques. We establish a convergence result of the last iterate to the set of Nash equilibria and show that the speed of convergence depends on the amount of improvement offered by these improved policies. In addition, we show that under some condition, if we use a best response as improved policy, then an exponential convergence rate is achieved.
Author Information
Remi Munos (DeepMind)
Julien Perolat (DeepMind)
Jean-Baptiste Lespiau (DeepMind)
Mark Rowland (DeepMind)
Bart De Vylder (DeepMind)
Marc Lanctot (DeepMind)
Finbarr Timbers (DeepMind)
Daniel Hennes (DeepMind)
Shayegan Omidshafiei (DeepMind)
Audrunas Gruslys (DeepMind)
Mohammad Gheshlaghi Azar (Deepmind)
Edward Lockhart (DeepMind)
Karl Tuyls (DeepMind)
More from the Same Authors
-
2021 : Marginalized Operators for Off-Policy Reinforcement Learning »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 : Density-Based Bonuses on Learned Representations for Reward-Free Exploration in Deep Reinforcement Learning »
Omar Darwiche Domingues · Corentin Tallec · Remi Munos · Michal Valko -
2023 : Algorithms for Optimal Adaptation of Diffusion Models to Reward Functions »
Krishnamurthy Dvijotham · Shayegan Omidshafiei · Kimin Lee · Katie Collins · Deepak Ramachandran · Adrian Weller · Mohammad Ghavamzadeh · Milad Nasresfahani · Ying Fan · Jeremiah Liu -
2023 Poster: Understanding Self-Predictive Learning for Reinforcement Learning »
Yunhao Tang · Zhaohan Guo · Pierre Richemond · Bernardo Avila Pires · Yash Chandak · Remi Munos · Mark Rowland · Mohammad Gheshlaghi Azar · Charline Le Lan · Clare Lyle · Andras Gyorgy · Shantanu Thakoor · Will Dabney · Bilal Piot · Daniele Calandriello · Michal Valko -
2023 Poster: Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments »
Daniel Jarrett · Corentin Tallec · Florent Altché · Thomas Mesnard · Remi Munos · Michal Valko -
2023 Poster: Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition »
Yash Chandak · Shantanu Thakoor · Zhaohan Guo · Yunhao Tang · Remi Munos · Will Dabney · Diana Borsa -
2023 Poster: Towards a better understanding of representation dynamics under TD-learning »
Yunhao Tang · Remi Munos -
2023 Oral: Adapting to game trees in zero-sum imperfect information games »
Côme Fiegel · Pierre Menard · Tadashi Kozuno · Remi Munos · Vianney Perchet · Michal Valko -
2023 Poster: Bootstrapped Representations in Reinforcement Learning »
Charline Le Lan · Stephen Tu · Mark Rowland · Anna Harutyunyan · Rishabh Agarwal · Marc Bellemare · Will Dabney -
2023 Poster: Adapting to game trees in zero-sum imperfect information games »
Côme Fiegel · Pierre Menard · Tadashi Kozuno · Remi Munos · Vianney Perchet · Michal Valko -
2023 Poster: Fast Rates for Maximum Entropy Exploration »
Daniil Tiapkin · Denis Belomestny · Daniele Calandriello · Eric Moulines · Remi Munos · Alexey Naumov · Pierre Perrault · Yunhao Tang · Michal Valko · Pierre Menard -
2023 Oral: Human-Timescale Adaptation in an Open-Ended Task Space »
Jakob Bauer · Kate Baumli · Feryal Behbahani · Avishkar Bhoopchand · Natalie Bradley-Schmieg · Michael Chang · Natalie Clay · Adrian Collister · Vibhavari Dasagi · Lucy Gonzalez · Karol Gregor · Edward Hughes · Sheleem Kashem · Maria Loks-Thompson · Hannah Openshaw · Jack Parker-Holder · Shreya Pathak · Nicolas Perez-Nieves · Nemanja Rakicevic · Tim Rocktäschel · Yannick Schroecker · Satinder Singh · Jakub Sygnowski · Karl Tuyls · Sarah York · Alexander Zacherl · Lei Zhang -
2023 Oral: Quantile Credit Assignment »
Thomas Mesnard · Wenqi Chen · Alaa Saade · Yunhao Tang · Mark Rowland · Theophane Weber · Clare Lyle · Audrunas Gruslys · Michal Valko · Will Dabney · Georg Ostrovski · Eric Moulines · Remi Munos -
2023 Poster: The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation »
Mark Rowland · Yunhao Tang · Clare Lyle · Remi Munos · Marc Bellemare · Will Dabney -
2023 Poster: Quantile Credit Assignment »
Thomas Mesnard · Wenqi Chen · Alaa Saade · Yunhao Tang · Mark Rowland · Theophane Weber · Clare Lyle · Audrunas Gruslys · Michal Valko · Will Dabney · Georg Ostrovski · Eric Moulines · Remi Munos -
2023 Poster: Human-Timescale Adaptation in an Open-Ended Task Space »
Jakob Bauer · Kate Baumli · Feryal Behbahani · Avishkar Bhoopchand · Natalie Bradley-Schmieg · Michael Chang · Natalie Clay · Adrian Collister · Vibhavari Dasagi · Lucy Gonzalez · Karol Gregor · Edward Hughes · Sheleem Kashem · Maria Loks-Thompson · Hannah Openshaw · Jack Parker-Holder · Shreya Pathak · Nicolas Perez-Nieves · Nemanja Rakicevic · Tim Rocktäschel · Yannick Schroecker · Satinder Singh · Jakub Sygnowski · Karl Tuyls · Sarah York · Alexander Zacherl · Lei Zhang -
2023 Poster: DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm »
Yunhao Tang · Tadashi Kozuno · Mark Rowland · Anna Harutyunyan · Remi Munos · Bernardo Avila Pires · Michal Valko -
2023 Poster: VA-learning as a more efficient alternative to Q-learning »
Yunhao Tang · Remi Munos · Mark Rowland · Michal Valko -
2023 Poster: Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice »
Toshinori Kitamura · Tadashi Kozuno · Yunhao Tang · Nino Vieillard · Michal Valko · Wenhao Yang · Jincheng Mei · Pierre Menard · Mohammad Gheshlaghi Azar · Remi Munos · Olivier Pietquin · Matthieu Geist · Csaba Szepesvari · Wataru Kumagai · Yutaka Matsuo -
2022 : Prescriptive solutions in games: from theory to scale »
Julien Perolat -
2022 Poster: Learning Dynamics and Generalization in Deep Reinforcement Learning »
Clare Lyle · Mark Rowland · Will Dabney · Marta Kwiatkowska · Yarin Gal -
2022 Poster: Generalised Policy Improvement with Geometric Policy Composition »
Shantanu Thakoor · Mark Rowland · Diana Borsa · Will Dabney · Remi Munos · Andre Barreto -
2022 Spotlight: Learning Dynamics and Generalization in Deep Reinforcement Learning »
Clare Lyle · Mark Rowland · Will Dabney · Marta Kwiatkowska · Yarin Gal -
2022 Oral: Generalised Policy Improvement with Geometric Policy Composition »
Shantanu Thakoor · Mark Rowland · Diana Borsa · Will Dabney · Remi Munos · Andre Barreto -
2022 Poster: Scalable Deep Reinforcement Learning Algorithms for Mean Field Games »
Mathieu Lauriere · Sarah Perrin · Sertan Girgin · Paul Muller · Ayush Jain · Theophile Cabannes · Georgios Piliouras · Julien Perolat · Romuald Elie · Olivier Pietquin · Matthieu Geist -
2022 Poster: Improving Language Models by Retrieving from Trillions of Tokens »
Sebastian Borgeaud · Arthur Mensch · Jordan Hoffmann · Trevor Cai · Eliza Rutherford · Katie Millican · George van den Driessche · Jean-Baptiste Lespiau · Bogdan Damoc · Aidan Clark · Diego de Las Casas · Aurelia Guy · Jacob Menick · Roman Ring · Tom Hennigan · Saffron Huang · Loren Maggiore · Chris Jones · Albin Cassirer · Andy Brock · Michela Paganini · Geoffrey Irving · Oriol Vinyals · Simon Osindero · Karen Simonyan · Jack Rae · Erich Elsen · Laurent Sifre -
2022 Poster: Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games »
Siqi Liu · Marc Lanctot · Luke Marris · Nicolas Heess -
2022 Spotlight: Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games »
Siqi Liu · Marc Lanctot · Luke Marris · Nicolas Heess -
2022 Spotlight: Improving Language Models by Retrieving from Trillions of Tokens »
Sebastian Borgeaud · Arthur Mensch · Jordan Hoffmann · Trevor Cai · Eliza Rutherford · Katie Millican · George van den Driessche · Jean-Baptiste Lespiau · Bogdan Damoc · Aidan Clark · Diego de Las Casas · Aurelia Guy · Jacob Menick · Roman Ring · Tom Hennigan · Saffron Huang · Loren Maggiore · Chris Jones · Albin Cassirer · Andy Brock · Michela Paganini · Geoffrey Irving · Oriol Vinyals · Simon Osindero · Karen Simonyan · Jack Rae · Erich Elsen · Laurent Sifre -
2022 Spotlight: Scalable Deep Reinforcement Learning Algorithms for Mean Field Games »
Mathieu Lauriere · Sarah Perrin · Sertan Girgin · Paul Muller · Ayush Jain · Theophile Cabannes · Georgios Piliouras · Julien Perolat · Romuald Elie · Olivier Pietquin · Matthieu Geist -
2021 Poster: Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games »
Dustin Morrill · Ryan D'Orazio · Marc Lanctot · James Wright · Michael Bowling · Amy Greenwald -
2021 Spotlight: Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games »
Dustin Morrill · Ryan D'Orazio · Marc Lanctot · James Wright · Michael Bowling · Amy Greenwald -
2021 Poster: Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning »
Tadashi Kozuno · Yunhao Tang · Mark Rowland · Remi Munos · Steven Kapturowski · Will Dabney · Michal Valko · David Abel -
2021 Poster: From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization »
Julien Perolat · Remi Munos · Jean-Baptiste Lespiau · Shayegan Omidshafiei · Mark Rowland · Pedro Ortega · Neil Burch · Thomas Anthony · David Balduzzi · Bart De Vylder · Georgios Piliouras · Marc Lanctot · Karl Tuyls -
2021 Poster: Taylor Expansion of Discount Factors »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 Spotlight: Taylor Expansion of Discount Factors »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 Spotlight: Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning »
Tadashi Kozuno · Yunhao Tang · Mark Rowland · Remi Munos · Steven Kapturowski · Will Dabney · Michal Valko · David Abel -
2021 Spotlight: From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization »
Julien Perolat · Remi Munos · Jean-Baptiste Lespiau · Shayegan Omidshafiei · Mark Rowland · Pedro Ortega · Neil Burch · Thomas Anthony · David Balduzzi · Bart De Vylder · Georgios Piliouras · Marc Lanctot · Karl Tuyls -
2021 Poster: Counterfactual Credit Assignment in Model-Free Reinforcement Learning »
Thomas Mesnard · Theophane Weber · Fabio Viola · Shantanu Thakoor · Alaa Saade · Anna Harutyunyan · Will Dabney · Thomas Stepleton · Nicolas Heess · Arthur Guez · Eric Moulines · Marcus Hutter · Lars Buesing · Remi Munos -
2021 Poster: Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers »
Luke Marris · Paul Muller · Marc Lanctot · Karl Tuyls · Thore Graepel -
2021 Spotlight: Counterfactual Credit Assignment in Model-Free Reinforcement Learning »
Thomas Mesnard · Theophane Weber · Fabio Viola · Shantanu Thakoor · Alaa Saade · Anna Harutyunyan · Will Dabney · Thomas Stepleton · Nicolas Heess · Arthur Guez · Eric Moulines · Marcus Hutter · Lars Buesing · Remi Munos -
2021 Spotlight: Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers »
Luke Marris · Paul Muller · Marc Lanctot · Karl Tuyls · Thore Graepel -
2020 Poster: Monte-Carlo Tree Search as Regularized Policy Optimization »
Jean-Bastien Grill · Florent Altché · Yunhao Tang · Thomas Hubert · Michal Valko · Ioannis Antonoglou · Remi Munos -
2020 Poster: Revisiting Fundamentals of Experience Replay »
William Fedus · Prajit Ramachandran · Rishabh Agarwal · Yoshua Bengio · Hugo Larochelle · Mark Rowland · Will Dabney -
2020 Poster: Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning »
Zhaohan Guo · Bernardo Avila Pires · Bilal Piot · Jean-Bastien Grill · Florent Altché · Remi Munos · Mohammad Gheshlaghi Azar -
2020 Poster: Taylor Expansion Policy Optimization »
Yunhao Tang · Michal Valko · Remi Munos -
2019 Poster: Statistics and Samples in Distributional Reinforcement Learning »
Mark Rowland · Robert Dadashi · Saurabh Kumar · Remi Munos · Marc Bellemare · Will Dabney -
2019 Oral: Statistics and Samples in Distributional Reinforcement Learning »
Mark Rowland · Robert Dadashi · Saurabh Kumar · Remi Munos · Marc Bellemare · Will Dabney -
2019 Poster: Open-ended learning in symmetric zero-sum games »
David Balduzzi · Marta Garnelo · Yoram Bachrach · Wojciech Czarnecki · Julien Perolat · Max Jaderberg · Thore Graepel -
2019 Oral: Open-ended learning in symmetric zero-sum games »
David Balduzzi · Marta Garnelo · Yoram Bachrach · Wojciech Czarnecki · Julien Perolat · Max Jaderberg · Thore Graepel -
2018 Poster: The Uncertainty Bellman Equation and Exploration »
Brendan O'Donoghue · Ian Osband · Remi Munos · Vlad Mnih -
2018 Poster: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures »
Lasse Espeholt · Hubert Soyer · Remi Munos · Karen Simonyan · Vlad Mnih · Tom Ward · Yotam Doron · Vlad Firoiu · Tim Harley · Iain Dunning · Shane Legg · Koray Kavukcuoglu -
2018 Poster: Efficient Neural Audio Synthesis »
Nal Kalchbrenner · Erich Elsen · Karen Simonyan · Seb Noury · Norman Casagrande · Edward Lockhart · Florian Stimberg · Aäron van den Oord · Sander Dieleman · Koray Kavukcuoglu -
2018 Poster: Autoregressive Quantile Networks for Generative Modeling »
Georg Ostrovski · Will Dabney · Remi Munos -
2018 Oral: The Uncertainty Bellman Equation and Exploration »
Brendan O'Donoghue · Ian Osband · Remi Munos · Vlad Mnih -
2018 Oral: Autoregressive Quantile Networks for Generative Modeling »
Georg Ostrovski · Will Dabney · Remi Munos -
2018 Oral: Efficient Neural Audio Synthesis »
Nal Kalchbrenner · Erich Elsen · Karen Simonyan · Seb Noury · Norman Casagrande · Edward Lockhart · Florian Stimberg · Aäron van den Oord · Sander Dieleman · Koray Kavukcuoglu -
2018 Oral: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures »
Lasse Espeholt · Hubert Soyer · Remi Munos · Karen Simonyan · Vlad Mnih · Tom Ward · Yotam Doron · Vlad Firoiu · Tim Harley · Iain Dunning · Shane Legg · Koray Kavukcuoglu -
2018 Poster: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2018 Oral: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2018 Poster: Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement »
Andre Barreto · Diana Borsa · John Quan · Tom Schaul · David Silver · Matteo Hessel · Daniel J. Mankowitz · Augustin Zidek · Remi Munos -
2018 Poster: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2018 Poster: Implicit Quantile Networks for Distributional Reinforcement Learning »
Will Dabney · Georg Ostrovski · David Silver · Remi Munos -
2018 Oral: Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement »
Andre Barreto · Diana Borsa · John Quan · Tom Schaul · David Silver · Matteo Hessel · Daniel J. Mankowitz · Augustin Zidek · Remi Munos -
2018 Oral: Implicit Quantile Networks for Distributional Reinforcement Learning »
Will Dabney · Georg Ostrovski · David Silver · Remi Munos -
2018 Oral: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2017 Poster: Count-Based Exploration with Neural Density Models »
Georg Ostrovski · Marc Bellemare · Aäron van den Oord · Remi Munos -
2017 Talk: Count-Based Exploration with Neural Density Models »
Georg Ostrovski · Marc Bellemare · Aäron van den Oord · Remi Munos -
2017 Poster: A Distributional Perspective on Reinforcement Learning »
Marc Bellemare · Will Dabney · Remi Munos -
2017 Poster: Automated Curriculum Learning for Neural Networks »
Alex Graves · Marc Bellemare · Jacob Menick · Remi Munos · Koray Kavukcuoglu -
2017 Poster: Minimax Regret Bounds for Reinforcement Learning »
Mohammad Gheshlaghi Azar · Ian Osband · Remi Munos -
2017 Talk: A Distributional Perspective on Reinforcement Learning »
Marc Bellemare · Will Dabney · Remi Munos -
2017 Talk: Automated Curriculum Learning for Neural Networks »
Alex Graves · Marc Bellemare · Jacob Menick · Remi Munos · Koray Kavukcuoglu -
2017 Talk: Minimax Regret Bounds for Reinforcement Learning »
Mohammad Gheshlaghi Azar · Ian Osband · Remi Munos