Invited Talk

AI's Models of the World, and Ours

Jon Kleinberg

2025 Invited Talk

Abstract

Many different threads in recent work on generative AI address the simultaneous challenge of evaluating an AI system's explicit behavior at one level and its implicit representations of the world at another. Such distinctions become crucial as we interact with powerful AI systems, where a mismatch between the system's model of the world and our model of the world can lead to measurable situations in which the system has inadvertently `set us up to fail' through our interaction with it. We explore these questions through the lens of generation, drawing on examples from game-playing, geographic navigation, and other complex tasks: When we train a model to win chess games, what happens when we pair it with a weaker partner who makes some of the moves? When we train a model to find shortest paths, what happens when it has to deal with unexpected detours? The picture we construct is further complicated by theoretical results indicating that successful generation can be achieved even by agents that are provably incapable of identifying the model they're generating from.

The talk will include joint work with Ashton Anderson, Karim Hamade, Reid McIlroy-Young, Siddhartha Sen, Justin Chen, Sendhil Mullainathan, Ashesh Rambachan, Keyon Vafa, and Fan Wei.

Video

Chat is not available.