Skip to yearly menu bar Skip to main content


Poster

Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks

Atli Kosson · Bettina Messmer · Martin Jaggi

Hall C 4-9 #304
[ ] [ Paper PDF ]
[ Poster
Thu 25 Jul 2:30 a.m. PDT — 4 a.m. PDT

Abstract:

This study investigates how weight decay affects the update behavior of individual neurons in deep neural networks through a combination of applied analysis and experimentation. Weight decay can cause the expected magnitude and angular updates of a neuron's weight vector to converge to a steady state we call rotational equilibrium. These states can be highly homogeneous, effectively balancing the average rotation---a proxy for the effective learning rate---across different layers and neurons. Our work analyzes these dynamics across optimizers like Adam, Lion, and SGD with momentum, offering a new simple perspective on training that elucidates the efficacy of widely used but poorly understood methods in deep learning. We demonstrate how balanced rotation plays a key role in the effectiveness of normalization like Weight Standardization, as well as that of AdamW over Adam with L2-regularization. Finally, we show that explicitly controlling the rotation provides the benefits of weight decay while substantially reducing the need for learning rate warmup.

Chat is not available.