Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Human-Machine Collaboration and Teaming

How to Talk so Robots will Learn: Instructions, Descriptions, and Alignment

Theodore R Sumers


Abstract:

From the earliest years of our lives, humans use language to express our beliefs and desires. Being able to talk to artificial agents about our preferences would thus fulfill a central goal of value alignment. Yet today, we lack computational models explaining such flexible and abstract language use. To address this challenge, we consider social learning in a linear bandit setting and ask how a human might communicate preferences over behaviors (i.e. the reward function). We study two distinct types of language: instructions, which ground to concrete actions, and descriptions, which provide information about the reward function. To explain how humans use these forms of language, we suggest they reason about both known present and unknown future states: instructions optimize for the present, while descriptions optimize for the future. We formalize this choice by extending reward design to consider a distribution over states. We then define a pragmatic listener agent that infers the speaker's reward function by reasoning about how the speaker expresses themselves. Our findings suggest that (1) descriptions afford stronger generalization than instructions; and (2) the notion of a latent speaker horizon allows for more robust value alignment from natural language input. We hope these insights can help broaden the field's focus on instructions to study more abstract, descriptive language.

Chat is not available.