Skip to yearly menu bar Skip to main content

Workshop: Workshop on Reinforcement Learning Theory

Learning Pareto-Optimal Policies in Low-Rank Cooperative Markov Games

Abhimanyu Dubey · Alex `Sandy' Pentland

Abstract: We study cooperative multi-agent reinforcement learning in episodic Markov games with $n$ agents. While the global state and action spaces typically grow exponentially in $n$ in this setting, in this paper, we introduce a novel framework that combines function approximation and a graphical dependence structure that restricts the decision space to $o(dn)$ for each agent, where $d$ is the ambient dimensionality of the problem. We present a multi-agent value iteration algorithm that, under mild assumptions, provably recovers the set of Pareto-optimal policies, with finite-sample guarantees on the incurred regret. Furthermore, we demonstrate that our algorithm is {\em no-regret} even when there are only $\cO(1)$ episodes with communication, providing a scalable and provably no-regret algorithm for multi-agent reinforcement learning with function approximation. Our work provides a tractable approach to multi-agent decision-making that is provably efficient and amenable to large-scale collaborative systems.

Chat is not available.