Timezone: »
Instruction-following agents must ground language into their observation and action spaces. Learning to ground language is challenging, typically requiring domain-specific engineering or large quantities of human interaction data. To address this challenge, we propose using pretrained vision-language models (VLMs) to supervise embodied agents. We combine ideas from model distillation and hindsight experience replay (HER), using a VLM to retroactively generate language describing the agent's behavior. Simple prompting allows us to control the supervision signal, teaching an agent to interact with novel objects based on their names (e.g., planes) or their features (e.g., colors) in a 3D rendered environment. Fewshot prompting lets us teach abstract category membership, including pre-existing categories (food vs toys) and ad-hoc ones (arbitrary preferences over objects). Our work outlines a new and effective way to use internet-scale VLMs, repurposing the generic language grounding acquired by such models to teach task-relevant groundings to embodied agents.
Author Information
Theodore R Sumers (Princeton University)
Kenneth Marino (Google Deepmind)
Arun Ahuja (Deepmind)
Rob Fergus (Facebook / NYU)
Ishita Dasgupta (DeepMind)
More from the Same Authors
-
2022 : How to Talk so Robots will Learn: Instructions, Descriptions, and Alignment »
Theodore R Sumers -
2022 : Predicting Human Similarity Judgments Using Large Language Models »
Raja Marjieh · Ilia Sucholutsky · Theodore R Sumers · Nori Jacoby · Thomas Griffiths · Thomas Griffiths -
2023 : Accelerating exploration and representation learning with offline pre-training »
Bogdan Mazoure · Jake Bruce · Doina Precup · Rob Fergus · Ankit Anand -
2023 : Panel on Reasoning Capabilities of LLMs »
Guy Van den Broeck · Ishita Dasgupta · Subbarao Kambhampati · Jiajun Wu · Xi Victoria Lin · Samy Bengio · Beliz Gunel -
2023 : Reasoning Biases in Language Models »
Ishita Dasgupta -
2023 Poster: Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC »
Yilun Du · Conor Durkan · Robin Strudel · Josh Tenenbaum · Sander Dieleman · Rob Fergus · Jascha Sohl-Dickstein · Arnaud Doucet · Will Grathwohl -
2022 : Session 2: Reasoning in Brains vs Machines »
Emily Mackevicius · Kim Stachenfeld · tyler bonnen · Ishita Dasgupta -
2022 Poster: Tell me why! Explanations support learning relational and causal structure »
Andrew Lampinen · Nicholas Roy · Ishita Dasgupta · Stephanie Chan · Allison Tam · James McClelland · Chen Yan · Adam Santoro · Neil Rabinowitz · Jane Wang · Feilx Hill -
2022 Spotlight: Tell me why! Explanations support learning relational and causal structure »
Andrew Lampinen · Nicholas Roy · Ishita Dasgupta · Stephanie Chan · Allison Tam · James McClelland · Chen Yan · Adam Santoro · Neil Rabinowitz · Jane Wang · Feilx Hill -
2022 Poster: Distinguishing rule- and exemplar-based generalization in learning systems »
Ishita Dasgupta · Erin Grant · Thomas Griffiths -
2022 Spotlight: Distinguishing rule- and exemplar-based generalization in learning systems »
Ishita Dasgupta · Erin Grant · Thomas Griffiths -
2021 Oral: Decoupling Value and Policy for Generalization in Reinforcement Learning »
Roberta Raileanu · Rob Fergus -
2021 Poster: Decoupling Value and Policy for Generalization in Reinforcement Learning »
Roberta Raileanu · Rob Fergus -
2021 Poster: Reinforcement Learning with Prototypical Representations »
Denis Yarats · Rob Fergus · Alessandro Lazaric · Lerrel Pinto -
2021 Spotlight: Reinforcement Learning with Prototypical Representations »
Denis Yarats · Rob Fergus · Alessandro Lazaric · Lerrel Pinto -
2018 Poster: Stochastic Video Generation with a Learned Prior »
Emily Denton · Rob Fergus -
2018 Oral: Stochastic Video Generation with a Learned Prior »
Emily Denton · Rob Fergus