On Local Policies for Graph-Structured Markov Decision Processes
Fathima Faizal ⋅ Asuman Ozdaglar ⋅ Martin Wainwright
Abstract
We study a cooperative form of multi-agent reinforcement learning with state space dynamics and agent interaction controlled by an underlying graph. Each agent has a local state and action, the evolution of the local state depends only on the states and actions in the $1$-hop neighborhood defined by graph. Structured dynamics of this type arise in various applications, including network resource allocation, co-operative games, epidemic control, and wireless scheduling. The global state-action space scales exponentially in the number of agents, so that computing global optimal policies is intractable in the worst-case. We study conditions under which it is possible to approximate the optimal policies by an $m$-hop local policy for each agent, depending only on its $m$-hop neighborhood. By controlling the propagation of influences via a Dobrushin-type stability matrix, we establish that global optimal policies are sharply approximated by $m$-hop local policies whose sub-optimality gap decays exponentially in $m$.
Successful Page Load