Many reinforcement learning tasks can benefit from explicit planning based on an internal model of the environment. Previously, such planning components have been incorporated through a neural network that partially aligns with the computational graph of value iteration. Such network have so far been focused on restrictive environments (e.g. grid-worlds), and modelled the planning procedure only indirectly. We relax these constraints, proposing a graph neural network (GNN) that executes the value iteration (VI) algorithm, across arbitrary environment models, with direct supervision on the intermediate steps of VI. The results indicate that GNNs are able to model value iteration accurately, recovering favourable metrics and policies across a variety of out-of-distribution tests. This suggests that GNN executors with strong supervision are a viable component within deep reinforcement learning systems.
Teaser video | [ protected link dropped ]
Andreea-Ioana Deac (Mila/Universite de Montreal)
More from the Same Authors
2020 : Opening Remarks »
Petar Veličković · Andreea-Ioana Deac
2019 : Poster Session & Lunch break »
Kay Wiese · Brandon Carter · Dan DeBlasio · Mohammad Hashir · Rachel Chan · Matteo Manica · Ali Oskooei · Zhenqin Wu · Karren Yang · François FAGES · Ruishan Liu · Nicasia Beebe-Wang · Bryan He · Jacopo Cirrone · Pekka Marttinen · Elior Rahmani · Harri Lähdesmäki · Nikhil Yadala · Andreea-Ioana Deac · Ava Soleimany · Mansi Ranjit Mane · Jason Ernst · Joseph Paul Cohen · Joel Mathew · Vishal Agarwal · AN ZHENG