Skip to yearly menu bar Skip to main content

Workshop: ES-FoMo: Efficient Systems for Foundation Models

Compositional Interfaces for Compositional Generalization

Jelena Luketina · Jack Lanchantin · Sainbayar Sukhbaatar · Arthur Szlam


In this work, we study the effectiveness of a modular architecture for compositional generalization and transfer learning in the embodied agent setting. We develop an environment that allows us to independently vary perceptual modalities and action and task specifications, and use it to carefully analyze the agent’s performance in these compositions. We show that we can compose the agent’s perceptual suite, its task specifications, and its action spaces. Our experiments demonstrate zero-shot performance on held-out combinations of perception/instruction/action space and demonstration of fast adaptation (requiring fewer samples) to new perceptual or action spaces without the loss of performance.

Chat is not available.