Skip to yearly menu bar Skip to main content


Poster

Language Models Represent Beliefs of Self and Others

Wentao Zhu · Zhining Zhang · Yizhou Wang


Abstract:

Understanding and attributing mental states, a capability known as Theory of Mind (ToM), is fundamental for social reasoning. Despite the exhibition of certain ToM abilities by Large Language Models (LLMs), the mechanisms through which they achieve this remain poorly understood. In this study, we discover that the model's intermediate activations can linearly separate the belief status across different perspectives of different agents, suggesting the presence of internal representations of self and others' beliefs. By manipulating these representations, we observe dramatic changes in the models' ToM performance, highlighting their critical role in the social reasoning process. Our findings also extend to different tasks that involve varied causal inference patterns, indicating the potential generalizability of these representations.

Live content is unavailable. Log in and register to view live content