ICML Learning Pareto-Optimal Policies in Low-Rank Cooperative Markov Games

Poster
in
Workshop: Workshop on Reinforcement Learning Theory

Learning Pareto-Optimal Policies in Low-Rank Cooperative Markov Games

Abhimanyu Dubey · Alex `Sandy' Pentland

[ Abstract ]

[ Visit Poster at Spot C2 in Virtual World ]

Abstract: We study cooperative multi-agent reinforcement learning in episodic Markov games with

n

$n$ agents. While the global state and action spaces typically grow exponentially in

n

$n$ in this setting, in this paper, we introduce a novel framework that combines function approximation and a graphical dependence structure that restricts the decision space to

o (d n)

$o(dn)$ for each agent, where

d

$d$ is the ambient dimensionality of the problem. We present a multi-agent value iteration algorithm that, under mild assumptions, provably recovers the set of Pareto-optimal policies, with finite-sample guarantees on the incurred regret. Furthermore, we demonstrate that our algorithm is {\em no-regret} even when there are only

\cO (1)

$\cO(1)$ episodes with communication, providing a scalable and provably no-regret algorithm for multi-agent reinforcement learning with function approximation. Our work provides a tractable approach to multi-agent decision-making that is provably efficient and amenable to large-scale collaborative systems.

Chat is not available.

Poster in Workshop: Workshop on Reinforcement Learning Theory

Learning Pareto-Optimal Policies in Low-Rank Cooperative Markov Games

Abhimanyu Dubey · Alex `Sandy' Pentland

Poster
in
Workshop: Workshop on Reinforcement Learning Theory