Timezone: »
We introduce algorithms based on natural actor-critic and analyze their sample complexity for solving two player zero-sum Markov games in the tabular case. Our results improve the best-known sample complexities of policy gradient/actor-critic methods for convergence to Nash equilibrium in the multi-agent setting. We use the error propagation scheme in approximate dynamic programming, recent advances for global convergence of policy gradient methods, temporal difference learning, and techniques from stochastic primal-dual optimization. Our algorithms feature two stages, requiring agents to agree on an etiquette before starting their interactions, which is feasible for instance in self-play. However, the agents only access to joint reward and joint next state and not to each other's actions or policies. Our complexity results match the best-known results for global convergence of policy gradient algorithms for single agent RL. We provide numerical verification of our methods for a two player bandit environment and a two player game, Alesia. We observe improved empirical performance as compared to the recently proposed optimistic gradient descent-ascent variant for Markov games.
Author Information
Ahmet Alacaoglu (University of Wisconsin-Madison)
Luca Viano (EPFL)
Niao He (ETH Zurich)
Volkan Cevher (EPFL)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: A Natural Actor-Critic Framework for Zero-Sum Markov Games »
Wed. Jul 20th 09:40 -- 09:45 PM Room Room 318 - 320
More from the Same Authors
-
2021 : Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation »
Semih Cayci · Siddhartha Satpathi · Niao He · R Srikant -
2021 : Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation »
Semih Cayci · Niao He · R Srikant -
2022 : Robustness in deep learning: The width (good), the depth (bad), and the initialization (ugly) »
Zhenyu Zhu · Fanghui Liu · Grigorios Chrysos · Volkan Cevher -
2022 : Sound and Complete Verification of Polynomial Networks »
Elias Abad Rocamora · Mehmet Fatih Sahin · Fanghui Liu · Grigorios Chrysos · Volkan Cevher -
2023 Poster: When do Minimax-fair Learning and Empirical Risk Minimization Coincide? »
Harvineet Singh · Matthäus Kleindessner · Volkan Cevher · Rumi Chunara · Chris Russell -
2023 Poster: Convergence of first-order methods for nonconvex constrained optimization with dependent data »
Hanbaek Lyu · Ahmet Alacaoglu -
2023 Poster: Semi Bandit dynamics in Congestion Games: Convergence to Nash Equilibrium and No-Regret Guarantees. »
Ioannis Panageas · EFSTRATIOS PANTELEIMON SKOULAKIS · Luca Viano · Xiao Wang · Volkan Cevher -
2023 Poster: Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies »
Ilyas Fatkhullin · Anas Barakat · Anastasia Kireeva · Niao He -
2023 Poster: Benign Overfitting in Deep Neural Networks »
Zhenyu Zhu · Fanghui Liu · Grigorios Chrysos · Francesco Locatello · Volkan Cevher -
2023 Poster: Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space »
Anas Barakat · Ilyas Fatkhullin · Niao He -
2023 Poster: What can online reinforcement learning benefit from coverage conditions? »
Fanghui Liu · Luca Viano · Volkan Cevher -
2023 Poster: Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games »
Batuhan Yardim · Semih Cayci · Matthieu Geist · Niao He -
2023 Oral: Semi Bandit dynamics in Congestion Games: Convergence to Nash Equilibrium and No-Regret Guarantees. »
Ioannis Panageas · EFSTRATIOS PANTELEIMON SKOULAKIS · Luca Viano · Xiao Wang · Volkan Cevher -
2022 Poster: Score Matching Enables Causal Discovery of Nonlinear Additive Noise Models »
Paul Rolland · Volkan Cevher · Matthäus Kleindessner · Chris Russell · Dominik Janzing · Bernhard Schölkopf · Francesco Locatello -
2022 Poster: UnderGrad: A Universal Black-Box Optimization Method with Almost Dimension-Free Convergence Rate Guarantees »
Kimon Antonakopoulos · Dong Quan Vu · Volkan Cevher · Kfir Levy · Panayotis Mertikopoulos -
2022 Oral: UnderGrad: A Universal Black-Box Optimization Method with Almost Dimension-Free Convergence Rate Guarantees »
Kimon Antonakopoulos · Dong Quan Vu · Volkan Cevher · Kfir Levy · Panayotis Mertikopoulos -
2022 Oral: Score Matching Enables Causal Discovery of Nonlinear Additive Noise Models »
Paul Rolland · Volkan Cevher · Matthäus Kleindessner · Chris Russell · Dominik Janzing · Bernhard Schölkopf · Francesco Locatello -
2021 Workshop: Workshop on Reinforcement Learning Theory »
Shipra Agrawal · Simon Du · Niao He · Csaba Szepesvari · Lin Yang -
2021 Poster: The Limits of Min-Max Optimization Algorithms: Convergence to Spurious Non-Critical Sets »
Ya-Ping Hsieh · Panayotis Mertikopoulos · Volkan Cevher -
2021 Poster: Regret Minimization in Stochastic Non-Convex Learning via a Proximal-Gradient Approach »
Nadav Hallak · Panayotis Mertikopoulos · Volkan Cevher -
2021 Spotlight: Regret Minimization in Stochastic Non-Convex Learning via a Proximal-Gradient Approach »
Nadav Hallak · Panayotis Mertikopoulos · Volkan Cevher -
2021 Oral: The Limits of Min-Max Optimization Algorithms: Convergence to Spurious Non-Critical Sets »
Ya-Ping Hsieh · Panayotis Mertikopoulos · Volkan Cevher -
2020 Poster: Efficient Proximal Mapping of the 1-path-norm of Shallow Networks »
Fabian Latorre · Paul Rolland · Shaul Nadav Hallak · Volkan Cevher -
2020 Poster: Conditional gradient methods for stochastically constrained convex minimization »
Maria-Luiza Vladarean · Ahmet Alacaoglu · Ya-Ping Hsieh · Volkan Cevher -
2020 Poster: Random extrapolation for primal-dual coordinate descent »
Ahmet Alacaoglu · Olivier Fercoq · Volkan Cevher -
2020 Poster: Double-Loop Unadjusted Langevin Algorithm »
Paul Rolland · Armin Eftekhari · Ali Kavis · Volkan Cevher -
2020 Poster: A new regret analysis for Adam-type algorithms »
Ahmet Alacaoglu · Yura Malitsky · Panayotis Mertikopoulos · Volkan Cevher -
2019 Poster: Almost surely constrained convex optimization »
Olivier Fercoq · Ahmet Alacaoglu · Ion Necoara · Volkan Cevher -
2019 Poster: Finding Mixed Nash Equilibria of Generative Adversarial Networks »
Ya-Ping Hsieh · Chen Liu · Volkan Cevher -
2019 Poster: Efficient learning of smooth probability functions from Bernoulli tests with guarantees »
Paul Rolland · Ali Kavis · Alexander Niklaus Immer · Adish Singla · Volkan Cevher -
2019 Oral: Finding Mixed Nash Equilibria of Generative Adversarial Networks »
Ya-Ping Hsieh · Chen Liu · Volkan Cevher -
2019 Oral: Efficient learning of smooth probability functions from Bernoulli tests with guarantees »
Paul Rolland · Ali Kavis · Alexander Niklaus Immer · Adish Singla · Volkan Cevher -
2019 Oral: Almost surely constrained convex optimization »
Olivier Fercoq · Ahmet Alacaoglu · Ion Necoara · Volkan Cevher -
2019 Poster: On Certifying Non-Uniform Bounds against Adversarial Attacks »
Chen Liu · Ryota Tomioka · Volkan Cevher -
2019 Poster: Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator »
Alp Yurtsever · Suvrit Sra · Volkan Cevher -
2019 Poster: A Conditional-Gradient-Based Augmented Lagrangian Framework »
Alp Yurtsever · Olivier Fercoq · Volkan Cevher -
2019 Oral: Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator »
Alp Yurtsever · Suvrit Sra · Volkan Cevher -
2019 Oral: A Conditional-Gradient-Based Augmented Lagrangian Framework »
Alp Yurtsever · Olivier Fercoq · Volkan Cevher -
2019 Oral: On Certifying Non-Uniform Bounds against Adversarial Attacks »
Chen Liu · Ryota Tomioka · Volkan Cevher -
2018 Poster: A Conditional Gradient Framework for Composite Convex Minimization with Applications to Semidefinite Programming »
Alp Yurtsever · Olivier Fercoq · Francesco Locatello · Volkan Cevher -
2018 Oral: A Conditional Gradient Framework for Composite Convex Minimization with Applications to Semidefinite Programming »
Alp Yurtsever · Olivier Fercoq · Francesco Locatello · Volkan Cevher -
2018 Poster: Let’s be Honest: An Optimal No-Regret Framework for Zero-Sum Games »
Ehsan Asadi Kangarshahi · Ya-Ping Hsieh · Mehmet Fatih Sahin · Volkan Cevher -
2018 Poster: Optimal Distributed Learning with Multi-pass Stochastic Gradient Methods »
Junhong Lin · Volkan Cevher -
2018 Oral: Let’s be Honest: An Optimal No-Regret Framework for Zero-Sum Games »
Ehsan Asadi Kangarshahi · Ya-Ping Hsieh · Mehmet Fatih Sahin · Volkan Cevher -
2018 Oral: Optimal Distributed Learning with Multi-pass Stochastic Gradient Methods »
Junhong Lin · Volkan Cevher -
2018 Poster: Optimal Rates of Sketched-regularized Algorithms for Least-Squares Regression over Hilbert Spaces »
Junhong Lin · Volkan Cevher -
2018 Oral: Optimal Rates of Sketched-regularized Algorithms for Least-Squares Regression over Hilbert Spaces »
Junhong Lin · Volkan Cevher -
2017 Poster: Robust Submodular Maximization: A Non-Uniform Partitioning Approach »
Ilija Bogunovic · Slobodan Mitrovic · Jonathan Scarlett · Volkan Cevher -
2017 Talk: Robust Submodular Maximization: A Non-Uniform Partitioning Approach »
Ilija Bogunovic · Slobodan Mitrovic · Jonathan Scarlett · Volkan Cevher