Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos
Abstract
We develop a mean-field theory of dropout as a perturbation of critical signal propagation at the edge of chaos. Dropout shifts the perfect-alignment fixed point, making the depth scale for information propagation finite even at critical initialization. We derive critical and crossover scaling laws for correlation decay and establish that smooth activations and kinked (ReLU-type) activations constitute distinct universality classes, with different critical exponents and a universal two-parameter scaling collapse in detuning and dropout strength. The distinction traces to the analytic structure of the correlation map: smooth activations admit a Taylor expansion near perfect alignment, while kinked activations develop a branch point with universal non-analyticity. As a corollary, the framework yields principled dropout schedules that maximize effective correlation length under fixed budget. We validate the theoretical predictions in MLPs and Vision Transformers, where the predicted schedules outperform constant dropout, illustrating the practical utility of the mean-field approach.