Probing the Inductive Bias of Neural Networks through Learning Random Cellular Automata
Abstract
Why do neural networks generalize well on natural data? Natural data originates from processes subject to specific physical constraints, such as temporal and spatial invariance, that make it easier to learn. We investigate the sufficiency of these properties using 2D cellular automata as a controlled testbed: systems that are perfectly local, symmetric, and deterministic. We find that these conditions alone are \textit{not sufficient} to predict the k-step evolution of a cellular automaton. We then examine smoothness (average sensitivity) as an additional criterion and find it predictive but still incomplete. Finally, we introduce a circuit complexity perspective, hypothesizing that natural functions are computable by small circuits. Junta coefficients, measuring the concentration of Fourier weight by interaction degree, provide a tighter predictor of learnability and a correspondence to combinatorial complexity. Across architectures (CNNs, transformers, MLPs), learnable functions are predominantly those with spectral weight concentrated at low degrees and therefore low complexity. These results would be consistent with the hypothesis that natural data is learnable because natural dynamics filters out complex, high-degree interactions.