A Framework for Understanding Learnability in Transformers
Abstract
Transformers consistently fail to learn certain simple functions such as Parity---which returns whether the input has an even number of ones---even when they can provably compute them with specific parameter settings. This gap between learnability and expressivity is particularly prominent for sensitive functions---functions whose output is likely to change if a single bit of the input is changed. While prior work has established that transformers exhibit a bias toward low-sensitivity functions, the precise mechanism underlying this bias remains poorly understood. To shed light onto this phenomenon, we study the geometry of transformers' parameter space. We show that sensitive functions---even when representable---occupy a vanishingly small region that random initialization is unlikely to reach. More specifically, we prove that randomly initialized transformers almost surely compute functions with many low-sensitivity inputs, where flipping a bit is unlikely to change the output. Our results provide a novel theoretical grounding for the empirical observation that transformers exhibit a strong bias toward low-sensitivity functions, shifting the focus from average sensitivity to the full sensitivity profile.