Timezone: »
Poster
Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games
Batuhan Yardim · Semih Cayci · Matthieu Geist · Niao He
Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous $N$-player games. However, limiting applicability, existing theoretical results assume variations of a ``population generative model'', which allows arbitrary modifications of the population distribution by the learning algorithm. Moreover, learning algorithms typically work on abstract simulators with population instead of the $N$-player game. Instead, we show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $\widetilde{\mathcal{O}}(\varepsilon^{-2})$ samples from a single sample trajectory without a population generative model, up to a standard $\mathcal{O}(\frac{1}{\sqrt{N}})$ error due to the mean field. Taking a divergent approach from the literature, instead of working with the best-response map we first show that a policy mirror ascent map can be used to construct a contractive operator having the Nash equilibrium as its fixed point. We analyze single-path TD learning for $N$-agent games, proving sample complexity guarantees by only using a sample path from the $N$-agent simulator without a population generative model. Furthermore, we demonstrate that our methodology allows for independent learning by $N$ agents with finite sample guarantees.
Author Information
Batuhan Yardim (ETH Zurich)
Semih Cayci (Rheinisch Westfälische Technische Hochschule Aachen)
Matthieu Geist (Google)
Niao He (ETH Zurich)
More from the Same Authors
-
2021 : Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation »
Semih Cayci · Siddhartha Satpathi · Niao He · R Srikant -
2021 : Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation »
Semih Cayci · Niao He · R Srikant -
2021 : A functional mirror ascent view of policy gradient methods with function approximation »
Sharan Vaswani · Olivier Bachem · Simone Totaro · Matthieu Geist · Marlos C. Machado · Pablo Samuel Castro · Nicolas Le Roux -
2021 : Offline Reinforcement Learning as Anti-Exploration »
Shideh Rezaeifar · Robert Dadashi · Nino Vieillard · Léonard Hussenot · Olivier Bachem · Olivier Pietquin · Matthieu Geist -
2023 Poster: A Connection between One-Step RL and Critic Regularization in Reinforcement Learning »
Benjamin Eysenbach · Matthieu Geist · Sergey Levine · Ruslan Salakhutdinov -
2023 Poster: Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space »
Anas Barakat · Ilyas Fatkhullin · Niao He -
2023 Poster: Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies »
Ilyas Fatkhullin · Anas Barakat · Anastasia Kireeva · Niao He -
2023 Poster: Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice »
Toshinori Kitamura · Tadashi Kozuno · Yunhao Tang · Nino Vieillard · Michal Valko · Wenhao Yang · Jincheng Mei · Pierre Menard · Mohammad Gheshlaghi Azar · Remi Munos · Olivier Pietquin · Matthieu Geist · Csaba Szepesvari · Wataru Kumagai · Yutaka Matsuo -
2022 Poster: Large Batch Experience Replay »
Thibault Lahire · Matthieu Geist · Emmanuel Rachelson -
2022 Poster: Continuous Control with Action Quantization from Demonstrations »
Robert Dadashi · Léonard Hussenot · Damien Vincent · Sertan Girgin · Anton Raichuk · Matthieu Geist · Olivier Pietquin -
2022 Oral: Large Batch Experience Replay »
Thibault Lahire · Matthieu Geist · Emmanuel Rachelson -
2022 Spotlight: Continuous Control with Action Quantization from Demonstrations »
Robert Dadashi · Léonard Hussenot · Damien Vincent · Sertan Girgin · Anton Raichuk · Matthieu Geist · Olivier Pietquin -
2022 Poster: A Natural Actor-Critic Framework for Zero-Sum Markov Games »
Ahmet Alacaoglu · Luca Viano · Niao He · Volkan Cevher -
2022 Spotlight: A Natural Actor-Critic Framework for Zero-Sum Markov Games »
Ahmet Alacaoglu · Luca Viano · Niao He · Volkan Cevher -
2022 Poster: Scalable Deep Reinforcement Learning Algorithms for Mean Field Games »
Mathieu Lauriere · Sarah Perrin · Sertan Girgin · Paul Muller · Ayush Jain · Theophile Cabannes · Georgios Piliouras · Julien Perolat · Romuald Elie · Olivier Pietquin · Matthieu Geist -
2022 Spotlight: Scalable Deep Reinforcement Learning Algorithms for Mean Field Games »
Mathieu Lauriere · Sarah Perrin · Sertan Girgin · Paul Muller · Ayush Jain · Theophile Cabannes · Georgios Piliouras · Julien Perolat · Romuald Elie · Olivier Pietquin · Matthieu Geist -
2021 Workshop: Workshop on Reinforcement Learning Theory »
Shipra Agrawal · Simon Du · Niao He · Csaba Szepesvari · Lin Yang -
2021 Poster: Hyperparameter Selection for Imitation Learning »
Léonard Hussenot · Marcin Andrychowicz · Damien Vincent · Robert Dadashi · Anton Raichuk · Sabela Ramos · Nikola Momchev · Sertan Girgin · Raphael Marinier · Lukasz Stafiniak · Emmanuel Orsini · Olivier Bachem · Matthieu Geist · Olivier Pietquin -
2021 Oral: Hyperparameter Selection for Imitation Learning »
Léonard Hussenot · Marcin Andrychowicz · Damien Vincent · Robert Dadashi · Anton Raichuk · Sabela Ramos · Nikola Momchev · Sertan Girgin · Raphael Marinier · Lukasz Stafiniak · Emmanuel Orsini · Olivier Bachem · Matthieu Geist · Olivier Pietquin -
2021 Poster: Offline Reinforcement Learning with Pseudometric Learning »
Robert Dadashi · Shideh Rezaeifar · Nino Vieillard · Léonard Hussenot · Olivier Pietquin · Matthieu Geist -
2021 Spotlight: Offline Reinforcement Learning with Pseudometric Learning »
Robert Dadashi · Shideh Rezaeifar · Nino Vieillard · Léonard Hussenot · Olivier Pietquin · Matthieu Geist -
2019 Poster: A Theory of Regularized Markov Decision Processes »
Matthieu Geist · Bruno Scherrer · Olivier Pietquin -
2019 Poster: Learning from a Learner »
alexis jacq · Matthieu Geist · Ana Paiva · Olivier Pietquin -
2019 Oral: A Theory of Regularized Markov Decision Processes »
Matthieu Geist · Bruno Scherrer · Olivier Pietquin -
2019 Oral: Learning from a Learner »
alexis jacq · Matthieu Geist · Ana Paiva · Olivier Pietquin