Poster
in
Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning
The Butterfly Effect: Tiny Perturbations Cause Neural Network Training to Diverge
Gül Sena Altintas · Devin Kwok · David Rolnick
Abstract:
Neural network training begins with a chaotic phase in which the network is sensitive to small perturbations, such as those caused by stochastic gradient descent (SGD). This sensitivity can cause identically initialized networks to diverge both in parameter space and functional similarity.However, the exact degree to which networks are sensitive to perturbation, and the sensitivity of networks as they transition out of the chaotic phase, is unclear.To address this uncertainty, we apply a controlled perturbation at a single point in training time and measure its effect on otherwise identical training trajectories.We find that both the $L^2$ distance and the loss barrier (increase in loss on a linear path between networks) between networks trained in this manner increase with perturbation magnitude and how early the perturbation occurs.Finally, we propose a conjecture relating the sensitivity of a network to how easily it is permuted with respect to another network.
Chat is not available.