Poster
in
Workshop: Workshop on Theory of Mind in Communicating Agents
EPITOME: Experimental Protocol Inventory for Theory Of Mind Evaluation
Cameron Jones · Sean Trott · Ben Bergen
Keywords: [ theory of mind ] [ large language models ] [ distributional information ] [ social cognition ]
We address a growing debate about the extent to which large language models (LLMs) produce behavior consistent with Theory of Mind (ToM) in humans. We present EPITOME: a battery of six experiments that tap diverse ToM capacities, including belief attribution, emotional inference, pragmatic reasoning, and non-literal communication. For each task we compare responses from 5 LLMs to a baseline of responses from human comprehenders. Results are mixed. LLMs show broad sensitivity to mental state information and perform at parity with humans across several tasks. However, models make systematic errors in other tasks, especially those that require pragmatic reasoning from mental state information. Such inconsistent performance suggests that crediting LLMs with ToM may be premature.