Timezone: »
Constructing AI models that respond to text instructions is challenging, especially for (multi-modal) sequential decision-making tasks. This study introduces an instruction-tuned Video Pretraining (VPT) model for Minecraft called STEVE-1, demonstrating that the unCLIP approach, utilized in DALL•E 2, is also effective for creating instruction-following sequential decision-making agents. STEVE-1 is trained in two steps: adapting the pretrained VPT model to follow commands in MineCLIP's latent space, then training a prior to predict latent codes from text. This allows us to finetune VPT through self-supervised behavioral cloning and hindsight relabeling, bypassing the need for costly human text annotations. By leveraging pretrained models like VPT and MineCLIP and employing best practices from text-conditioned image generation, STEVE-1 costs just $60 to train and can follow nearly any short-horizon open-ended text and visual task in Minecraft. We provide experimental evidence highlighting key factors for downstream performance, including pretraining, classifier-free guidance, and data scaling. All resources, including our model weights, datasets, and evaluation tools, are made available for further research.
Author Information
Shalev Lifshitz (Department of Computer Science)
Keiran Paster (University of Toronto)
Harris Chan (University of Toronto, Vector Institute)
Jimmy Ba (University of Toronto / xAI)
Sheila McIlraith (University of Toronto and Vector Institute)
Sheila McIlraith is a Professor in the Department of Computer Science at the University of Toronto, a Canada CIFAR AI Chair (Vector Institute), and a Research Lead at the Schwartz Reisman Institute for Technology and Society. McIlraith's research is in the area of AI sequential decision making broadly construed, with a focus on human-compatible AI. McIlraith is a Fellow of the ACM and AAAI.
More from the Same Authors
-
2021 : On Low Rank Training of Deep Neural Networks »
Siddhartha Kamalakara · Acyr Locatelli · Bharat Venkitesh · Jimmy Ba · Yarin Gal · Aidan Gomez -
2021 : AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning »
Maayan Shvo · Zhiming Hu · Rodrigo A Toro Icarte · Iqbal Mohomed · Allan Jepson · Sheila McIlraith -
2022 : Exploring Long-Horizon Reasoning with Deep RL in Combinatorially Hard Tasks »
Andrew C Li · Pashootan Vaezipoor · Rodrigo A Toro Icarte · Sheila McIlraith -
2022 : You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2023 : Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu -
2023 : Training on Thin Air: Improve Image Classification with Generated Data »
Yongchao Zhou · Hshmat Sahak · Jimmy Ba -
2023 : Calibrating Language Models via Augmented Prompt Ensembles »
Mingjian Jiang · Yangjun Ruan · Sicong Huang · Saifei Liao · Silviu Pitis · Roger Grosse · Jimmy Ba -
2023 : Using Synthetic Data for Data Augmentation to Improve Classification Accuracy »
Yongchao Zhou · Hshmat Sahak · Jimmy Ba -
2023 : A Generative Model for Text Control in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 Poster: TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation »
Zhaoyan Liu · Noël Vouitsis · Satya Krishna Gorti · Jimmy Ba · Gabriel Loaiza-Ganem -
2023 Poster: Learning Belief Representations for Partially Observable Deep RL »
Andrew Wang · Andrew C Li · Toryn Q Klassen · Rodrigo A Toro Icarte · Sheila McIlraith -
2021 Poster: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2021 Spotlight: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2021 Poster: LTL2Action: Generalizing LTL Instructions for Multi-Task RL »
Pashootan Vaezipoor · Andrew C Li · Rodrigo A Toro Icarte · Sheila McIlraith -
2021 Spotlight: LTL2Action: Generalizing LTL Instructions for Multi-Task RL »
Pashootan Vaezipoor · Andrew C Li · Rodrigo A Toro Icarte · Sheila McIlraith -
2020 Poster: Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning »
Silviu Pitis · Harris Chan · Stephen Zhao · Bradly Stadie · Jimmy Ba -
2020 Poster: Improving Transformer Optimization Through Better Initialization »
Xiao Shi Huang · Felipe Perez · Jimmy Ba · Maksims Volkovs -
2018 Poster: Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning »
Rodrigo A Toro Icarte · Toryn Q Klassen · Richard Valenzano · Sheila McIlraith -
2018 Oral: Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning »
Rodrigo A Toro Icarte · Toryn Q Klassen · Richard Valenzano · Sheila McIlraith