Invited talk
in
Workshop: Interactive Learning with Implicit Human Feedback
Jesse Thomason: Considering The Role of Language in Embodied Systems
Pretrained language models (PTLM) are "all the rage" right now. From the perspective of folks who have been working at the intersection of language, vision, and robotics since before it was cool, the noticeable impact is that researchers outside NLP feel like they should plug language into their work. However, these models are exclusively trained on text data, usually only for next word prediction, and potentially for next word prediction but under a fine-tuned words-as-actions policy with thousands of underpaid human annotators in the loop (e.g., RLHF). Even when a PTLM is "multimodal" that usually means "training also involved images and their captions, which describe the literal content of the image." What meaning can we hope to extract from those kinds of models in the context of embodied, interactive systems? In this talk, I'll cover some applications our lab has worked through in the space language and embodied systems with a broader lens towards open questions about the limits and (in)appropriate applications of current PTLMs with those systems.