Skip to yearly menu bar Skip to main content


Poster
in
Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning

A Phase Transition between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention

Hugo Cui · Freya Behrens · FLORENT KRZAKALA · Lenka Zdeborova


Abstract:

A theoretical understanding of how algorithmic abilities emerge in the learning of language models remains elusive. In this work, we provide a tight theoretical analysis of the emergence of semantic attention in a solvable model of dot-product attention and consider a non-linear self-attention layer with trainable tied and low-rank query and key matrices. In the asymptotic limit of high-dimensional data and a comparably large number of training samples we provide a tight closed-form characterization of the global minimum of the non-convex empirical loss landscape. We show that this minimum corresponds to either a positional attention mechanism (with tokens attending to each other based on their respective positions) or a semantic attention mechanism (with tokens attending to each other based on their meaning), and evidence an emergent phase transition from the former to the latter with increasing sample complexity.

Chat is not available.