Poster Wed, Jul 8, 2026 • 10:30 AM – 12:15 PM KST Coex: HALL A

A Framework for Understanding Learnability in Transformers

Blanka Kövér ⋅ Alexandra Butoi ⋅ Anej Svete ⋅ Michael Hahn ⋅ Ryan Cotterell

Abstract

Transformers consistently fail to learn certain simple functions such as Parity---which returns whether the input has an even number of ones---even when they can provably compute them with specific parameter settings. This gap between learnability and expressivity is particularly prominent for sensitive functions---functions whose output is likely to change if a single bit of the input is changed. While prior work has established that transformers exhibit a bias toward low-sensitivity functions, the precise mechanism underlying this bias remains poorly understood. To shed light onto this phenomenon, we study the geometry of transformers' parameter space. We show that sensitive functions---even when representable---occupy a vanishingly small region that random initialization is unlikely to reach. More specifically, we prove that randomly initialized transformers almost surely compute functions with many low-sensitivity inputs, where flipping a bit is unlikely to change the output. Our results provide a novel theoretical grounding for the empirical observation that transformers exhibit a strong bias toward low-sensitivity functions, shifting the focus from average sensitivity to the full sensitivity profile.