Skip to yearly menu bar Skip to main content


Poster

On the Origins of Linear Representations in Large Language Models

Yibo Jiang · Goutham Rajendran · Pradeep Ravikumar · Bryon Aragam · Victor Veitch

Hall C 4-9 #2200
[ ] [ Paper PDF ]
Wed 24 Jul 4:30 a.m. PDT — 6 a.m. PDT

Abstract:

An array of recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of large language models. In this work, we study the origins of such linear representations. To that end, we introduce a latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to prove that linearity arises as a consequence of the loss function and the implicit bias of gradient descent. The theory is further substantiated empirically via experiments.

Chat is not available.