Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Theory of Mind in Communicating Agents

EPITOME: Experimental Protocol Inventory for Theory Of Mind Evaluation

Cameron Jones · Sean Trott · Ben Bergen

Keywords: [ social cognition ] [ distributional information ] [ large language models ] [ theory of mind ]


Abstract:

We address a growing debate about the extent to which large language models (LLMs) produce behavior consistent with Theory of Mind (ToM) in humans. We present EPITOME: a battery of six experiments that tap diverse ToM capacities, including belief attribution, emotional inference, pragmatic reasoning, and non-literal communication. For each task we compare responses from 5 LLMs to a baseline of responses from human comprehenders. Results are mixed. LLMs show broad sensitivity to mental state information and perform at parity with humans across several tasks. However, models make systematic errors in other tasks, especially those that require pragmatic reasoning from mental state information. Such inconsistent performance suggests that crediting LLMs with ToM may be premature.

Chat is not available.