Timezone: »

Learning Pareto-Optimal Policies in Low-Rank Cooperative Markov Games
Abhimanyu Dubey · Alex Sandy' Pentland
We study cooperative multi-agent reinforcement learning in episodic Markov games with $n$ agents. While the global state and action spaces typically grow exponentially in $n$ in this setting, in this paper, we introduce a novel framework that combines function approximation and a graphical dependence structure that restricts the decision space to $o(dn)$ for each agent, where $d$ is the ambient dimensionality of the problem. We present a multi-agent value iteration algorithm that, under mild assumptions, provably recovers the set of Pareto-optimal policies, with finite-sample guarantees on the incurred regret. Furthermore, we demonstrate that our algorithm is {\em no-regret} even when there are only $\cO(1)$ episodes with communication, providing a scalable and provably no-regret algorithm for multi-agent reinforcement learning with function approximation. Our work provides a tractable approach to multi-agent decision-making that is provably efficient and amenable to large-scale collaborative systems.

#### Author Information

##### Abhimanyu Dubey (Massachusetts Institute of Technology)

I am a PhD student in the Human Dynamics group at MIT, advised by Professor Alex Pentland. My research interests are in robust and cooperative machine learning, including problems in multi-agent decision-making and transfer learning. Prior to this, I received a master's degree in Computer Science and bachelor's degree in Electrical Engineering at IIT Delhi, where I was advised by Professor Sumeet Agarwal. I've also spent time as a research intern at Facebook AI, and was a post-baccalaureate fellow at the Department of Economics at Harvard, under Professor Ed Glaeser. My research has been supported by a Snap Research Scholarship (2019) and an Emerging Worlds Fellowship (2017).