Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Theory of Mind in Communicating Agents

Do LLMs selectively encode the goal of an agent's reach?

Laura Ruis · Arduin Findeis · Herbie Bradley · Hossein A. Rahmani · Kyoung Whan Choe · Edward Grefenstette · Tim Rocktäschel

Keywords: [ theory of mind ] [ Woodward ] [ LLM ] [ language ] [ agents ]


Abstract:

In this work, we investigate whether large language models (LLMs) exhibit one of the earliest Theory of Mind-like behaviors: selectively encoding the goal object of an actor's reach (Woodward, 1998). We prompt state-of-the-art LLMs with ambiguous examples that can be explained both by an object or a location being the goal of an actor's reach, and evaluate the model's bias. We compare the magnitude of the bias in three situations: i) an agent is acting purposefully, ii) an inanimate object is acted upon, and iii) an agent is acting accidentally. We find that two models show a selective bias for agents acting purposefully, but are biased differently than humans. Additionally, the encoding is not robust to semantically equivalent prompt variations. We discuss how this bias compares to the bias infants show and provide a cautionary tale of evaluating machine Theory of Mind (ToM). We release our dataset and code.

Chat is not available.