Poster
in
Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning
Rank Minimization, Alignment and Weight Decay in Neural Networks
David Yunis · Kumar Kshitij Patel · Samuel Wheeler · Pedro Henrique Pamplona Savarese · Gal Vardi · Karen Livescu · Michael Maire · Matthew Walter
Abstract:
We empirically study the evolution of the singular values and vectors of neural network weights across a wide variety of practical architectures and domains, including CNNs for image classification, LSTMs for speech recognition, and Transformers for language modeling. Across these settings, we observe that (i) large singular values grow much faster, decreasing the effective ranks of weight matrices, (ii) this growth correlates with increased alignment between the neighboring layers' top singular vectors and (iii) weight decay promotes these two phenomena. Since these architectures are far from idealized linear neural networks, our observations extend the predictions of existing theory to more practical settings.
Chat is not available.