Spectral Reach: Understanding Neural Scaling through Kernel Alignment Dynamics
Abstract
Neural scaling laws describe predictable power-law relationships between model size, dataset size, compute cost, and performance. While these laws are applied to improve the performance of modern foundation models, the mechanisms underpinning them are less understood, in part due to the absence of scalable analysis tools. To this end, we introduce a framework for efficiently measuring the alignment between the empirical neural tangent kernel (eNTK) and loss residuals. Applying this framework to scaling experiments reveals a consistent pattern: larger and better-performing models exhibit lower kernel alignment throughout training. We interpret this unalignment through the lens of spectral reach: the capacity of a model to learn from progressively weaker spectral modes in its eNTK. This interpretation allows us to explain why larger models achieve lower losses: they sustain learning on weaker signals that smaller models cannot access. We further demonstrate that feature learning improves spectral reach and provide a mechanistic explanation of how this occurs, suggesting practical avenues for performance improvement.