Skip to yearly menu bar Skip to main content


Poster

Scalable Safe Policy Improvement for Factored Multi-Agent MDPs

Federico Bianchi · Edoardo Zorzi · Alberto Castellini · Thiago Simão · Matthijs T. J. Spaan · Alessandro Farinelli

Hall C 4-9 #1216
[ ]
Wed 24 Jul 4:30 a.m. PDT — 6 a.m. PDT

Abstract:

In this work, we focus on safe policy improvement in multi-agent domains where current state-of-the-art methods cannot be effectively applied because of large state and action spaces. We consider recent results using Monte Carlo Tree Search for Safe Policy Improvement with Baseline Bootstrapping and propose a novel algorithm that scales this approach to multi-agent domains, exploiting the factorization of the transition model and value function. Given a centralized behavior policy and a dataset of trajectories, our algorithm generates an improved policy by selecting joint actions using a novel extension of Max-Plus (or Variable Elimination) that constrains local actions to guarantee safety criteria. An empirical evaluation on multi-agent SysAdmin and multi-UAV Delivery shows that the approach scales to very large domains where state-of-the-art methods cannot work.

Live content is unavailable. Log in and register to view live content