A Fourier perspective on the learning dynamics of neural networks: from sample complexities to mechanistic insights
Fabiola Ricci ⋅ Claudia Merger ⋅ Sebastian Goldt
Abstract
Neural networks trained with gradient-based methods exhibit a strong simplicity bias, learning simpler statistical features of their data before moving to more complex features. In this work, we study this bias from a Fourier perspective, motivated by the approximate translation-invariance and the characteristic power spectra of natural images. We first show experimentally that simple neural networks trained on image classification tasks first rely on amplitude information -- related to pair-wise correlations between pixels -- before exploiting phase information, which encodes edges and higher-order correlations. To explain this phenomenon, we introduce a synthetic data model for translation-invariant inputs that allows precise control over the amplitudes and phases while remaining tractable. We rigorously establish that for isotropic and high-dimensional inputs, classifying them by relying only phase information is a genuinely hard task: online stochastic gradient descent cannot distinguish the structured inputs from noise within $n \ll N^3$ steps, but needs at least $n \gg N^3 \log^2{N}$ steps. In contrast, we prove that for non-isotropic inputs with power-law spectra, the existence of a dominant principal subspace can dramatically accelerate the speed of learning, even if the Fourier amplitudes are shared among classes and do not help with classification. Simulations with two-layer networks trained on textures, and with deep convolutional networks on ImageNet confirm this non-trivial interaction between amplitudes and phases, providing mechanistic insight into how deep neural networks can learn natural image distributions efficiently.
Successful Page Load