Timezone: »
In this work, we investigate whether large language models (LLMs) exhibit one of the earliest Theory of Mind-like behaviors: selectively encoding the goal object of an actor's reach (Woodward, 1998). We prompt state-of-the-art LLMs with ambiguous examples that can be explained both by an object or a location being the goal of an actor's reach, and evaluate the model's bias. We compare the magnitude of the bias in three situations: i) an agent is acting purposefully, ii) an inanimate object is acted upon, and iii) an agent is acting accidentally. We find that two models show a selective bias for agents acting purposefully, but are biased differently than humans. Additionally, the encoding is not robust to semantically equivalent prompt variations. We discuss how this bias compares to the bias infants show and provide a cautionary tale of evaluating machine Theory of Mind (ToM). We release our dataset and code.
Author Information
Laura Ruis (University College London)
Arduin Findeis (University of Cambridge)
I am a PhD candidate in the Department of Computer Science at the University of Cambridge. My research focuses on the evaluation of applied machine learning (ML) systems. Much of my work is centred around creating standardised benchmark tools for specific problems – to help accelerate progress on these problems. My current work focuses on the evaluation of language models. Previously, I also worked evaluation of (meta) reinforcement learning (RL) methods in the context of building control systems. I am part of the AI4ER UKRI Centre for Doctoral Training (CDT). Prior to joining my current PhD programme, I completed an MPhil in machine learning and machine intelligence in Cambridge and an undergraduate degree in mathematics at the University of Edinburgh.
Herbie Bradley (EleutherAI / University of Cambridge)
Hossein A. Rahmani (University College London)
PhD student at UCL
Kyoung Whan Choe (Carper AI)
Edward Grefenstette (Facebook AI Research & UCL)
Tim Rocktäschel (Facebook AI Research & University College London)
More from the Same Authors
-
2023 Poster: Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling »
Stella Biderman · Hailey Schoelkopf · Quentin Anthony · Herbie Bradley · Kyle O'Brien · Eric Hallahan · Mohammad Aflah Khan · Shivanshu Purohit · USVSN Sai Prashanth · Edward Raff · Aviya Skowron · Lintang Sutawika · Oskar van der Wal -
2023 Oral: Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling »
Stella Biderman · Hailey Schoelkopf · Quentin Anthony · Herbie Bradley · Kyle O'Brien · Eric Hallahan · Mohammad Aflah Khan · Shivanshu Purohit · USVSN Sai Prashanth · Edward Raff · Aviya Skowron · Lintang Sutawika · Oskar van der Wal -
2023 Oral: Human-Timescale Adaptation in an Open-Ended Task Space »
Jakob Bauer · Kate Baumli · Feryal Behbahani · Avishkar Bhoopchand · Natalie Bradley-Schmieg · Michael Chang · Natalie Clay · Adrian Collister · Vibhavari Dasagi · Lucy Gonzalez · Karol Gregor · Edward Hughes · Sheleem Kashem · Maria Loks-Thompson · Hannah Openshaw · Jack Parker-Holder · Shreya Pathak · Nicolas Perez-Nieves · Nemanja Rakicevic · Tim Rocktäschel · Yannick Schroecker · Satinder Singh · Jakub Sygnowski · Karl Tuyls · Sarah York · Alexander Zacherl · Lei Zhang -
2023 Poster: Human-Timescale Adaptation in an Open-Ended Task Space »
Jakob Bauer · Kate Baumli · Feryal Behbahani · Avishkar Bhoopchand · Natalie Bradley-Schmieg · Michael Chang · Natalie Clay · Adrian Collister · Vibhavari Dasagi · Lucy Gonzalez · Karol Gregor · Edward Hughes · Sheleem Kashem · Maria Loks-Thompson · Hannah Openshaw · Jack Parker-Holder · Shreya Pathak · Nicolas Perez-Nieves · Nemanja Rakicevic · Tim Rocktäschel · Yannick Schroecker · Satinder Singh · Jakub Sygnowski · Karl Tuyls · Sarah York · Alexander Zacherl · Lei Zhang -
2022 Poster: Evolving Curricula with Regret-Based Environment Design »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2022 Spotlight: Evolving Curricula with Regret-Based Environment Design »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2021 Poster: Prioritized Level Replay »
Minqi Jiang · Edward Grefenstette · Tim Rocktäschel -
2021 Spotlight: Prioritized Level Replay »
Minqi Jiang · Edward Grefenstette · Tim Rocktäschel -
2020 : The NetHack Learning Environment Q&A »
Tim Rocktäschel · Katja Hofmann -
2020 : The NetHack Learning Environment »
Tim Rocktäschel -
2020 Workshop: 1st Workshop on Language in Reinforcement Learning (LaReL) »
Nantas Nardelli · Jelena Luketina · Nantas Nardelli · Jakob Foerster · Victor Zhong · Jacob Andreas · Tim Rocktäschel · Edward Grefenstette · Tim Rocktäschel -
2020 Poster: Learning Reasoning Strategies in End-to-End Differentiable Proving »
Pasquale Minervini · Sebastian Riedel · Pontus Stenetorp · Edward Grefenstette · Tim Rocktäschel -
2019 Poster: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2019 Poster: CompILE: Compositional Imitation Learning and Execution »
Thomas Kipf · Yujia Li · Hanjun Dai · Vinicius Zambaldi · Alvaro Sanchez-Gonzalez · Edward Grefenstette · Pushmeet Kohli · Peter Battaglia -
2019 Oral: CompILE: Compositional Imitation Learning and Execution »
Thomas Kipf · Yujia Li · Hanjun Dai · Vinicius Zambaldi · Alvaro Sanchez-Gonzalez · Edward Grefenstette · Pushmeet Kohli · Peter Battaglia -
2019 Oral: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2017 Poster: Discovering Discrete Latent Topics with Neural Variational Inference »
Yishu Miao · Edward Grefenstette · Phil Blunsom -
2017 Talk: Discovering Discrete Latent Topics with Neural Variational Inference »
Yishu Miao · Edward Grefenstette · Phil Blunsom -
2017 Poster: Programming with a Differentiable Forth Interpreter »
Matko Bošnjak · Tim Rocktäschel · Jason Naradowsky · Sebastian Riedel -
2017 Talk: Programming with a Differentiable Forth Interpreter »
Matko Bošnjak · Tim Rocktäschel · Jason Naradowsky · Sebastian Riedel