Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning

Austin W. Hanjie · Victor Zhong · Karthik Narasimhan


Keywords: [ Reinforcement Learning and Planning ] [ Deep RL ]

[ Abstract ]
[ Slides
[ Paper ]
[ Visit Poster at Spot C6 in Virtual World ]
Wed 21 Jul 9 a.m. PDT — 11 a.m. PDT
Spotlight presentation: Reinforcement Learning 11
Wed 21 Jul 5 a.m. PDT — 6 a.m. PDT


We investigate the use of natural language to drive the generalization of control policies and introduce the new multi-task environment Messenger with free-form text manuals describing the environment dynamics. Unlike previous work, Messenger does not assume prior knowledge connecting text and state observations — the control policy must simultaneously ground the game manual to entity symbols and dynamics in the environment. We develop a new model, EMMA (Entity Mapper with Multi-modal Attention) which uses an entity-conditioned attention module that allows for selective focus over relevant descriptions in the manual for each entity in the environment. EMMA is end-to-end differentiable and learns a latent grounding of entities and dynamics from text to observations using only environment rewards. EMMA achieves successful zero-shot generalization to unseen games with new dynamics, obtaining a 40% higher win rate compared to multiple baselines. However, win rate on the hardest stage of Messenger remains low (10%), demonstrating the need for additional work in this direction.

Chat is not available.