Timezone: »
Poster
Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games
Dylan Foster · Noah Golowich · Sham Kakade
We consider the problem of decentralized multi-agent reinforcement learning in Markov games. A fundamental question is whether there exist algorithms that, when run independently by all agents, lead to no-regret for each player, analogous to celebrated convergence results for no-regret learning in normal-form games. While recent work has shown that such algorithms exist for restricted settings (notably, when regret is defined with respect to deviations to Markov policies), the question of whether independent no-regret learning can be achieved in the standard Markov game framework was open. We provide a decisive negative resolution to this problem, both from a computational and statistical perspective. We show that: • Under the complexity-theoretic assumption that PPAD $\neq$ P, there is no polynomial-time algorithm that attains no-regret in two-player general-sum Markov games when executed independently by all players, even when the game is known to the algorithm designer. • When the game is unknown, no algorithm, efficient or otherwise, can achieve no-regret without observing exponentially many episodes in the number of players. These results are proven via lower bounds for a simpler problem we refer to as SparseCCE, in which the goal is to compute a coarse correlated equilibrium that is ``sparse'' in the sense that it can be represented as a mixture of a small number of product policies.
Author Information
Dylan Foster (Microsoft Research)
Noah Golowich (Massachusetts Institute of Technology)
Sham Kakade (Harvard University and Amazon Scholar)
More from the Same Authors
-
2022 : Interaction-Grounded Learning with Action-inclusive Feedback »
Tengyang Xie · Akanksha Saran · Dylan Foster · Lekan Molu · Ida Momennejad · Nan Jiang · Paul Mineiro · John Langford -
2022 : The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2023 : Predicting Task Forgetting in Large Language Models »
Anat Kleiman · Jonathan Frankle · Sham Kakade · Mansheej Paul -
2023 Poster: Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL »
Zakaria Mhammedi · Dylan Foster · Alexander Rakhlin -
2023 Poster: Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron »
Jingfeng Wu · Difan Zou · Zixiang Chen · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2023 Poster: On Provable Copyright Protection for Generative Models »
Nikhil Vyas · Sham Kakade · Boaz Barak -
2023 Oral: Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL »
Zakaria Mhammedi · Dylan Foster · Alexander Rakhlin -
2022 Poster: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Poster: Contextual Bandits with Large Action Spaces: Made Practical »
Yinglun Zhu · Dylan Foster · John Langford · Paul Mineiro -
2022 Poster: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Poster: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Spotlight: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Oral: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Spotlight: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Spotlight: Contextual Bandits with Large Action Spaces: Made Practical »
Yinglun Zhu · Dylan Foster · John Langford · Paul Mineiro -
2022 Poster: Inductive Biases and Variable Creation in Self-Attention Mechanisms »
Benjamin Edelman · Surbhi Goel · Sham Kakade · Cyril Zhang -
2022 Spotlight: Inductive Biases and Variable Creation in Self-Attention Mechanisms »
Benjamin Edelman · Surbhi Goel · Sham Kakade · Cyril Zhang -
2022 : Q&A II »
Dylan Foster · Alexander Rakhlin -
2022 : Bridging Learning and Decision Making: Part II »
Dylan Foster -
2022 : Q&A »
Dylan Foster · Alexander Rakhlin -
2022 Tutorial: Bridging Learning and Decision Making »
Dylan Foster · Alexander Rakhlin -
2020 Poster: Naive Exploration is Optimal for Online LQR »
Max Simchowitz · Dylan Foster -
2020 Poster: Improved Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance »
Blair Bilodeau · Dylan Foster · Daniel Roy -
2020 Poster: Logarithmic Regret for Adversarial Online Control »
Dylan Foster · Max Simchowitz -
2020 Poster: Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles »
Dylan Foster · Alexander Rakhlin -
2019 Poster: Distributed Learning with Sublinear Communication »
Jayadev Acharya · Christopher De Sa · Dylan Foster · Karthik Sridharan -
2019 Oral: Distributed Learning with Sublinear Communication »
Jayadev Acharya · Christopher De Sa · Dylan Foster · Karthik Sridharan -
2018 Poster: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire