Preserving Plasticity in Continual Learning via Dynamical Isometry
Abstract
Continual training of deep neural networks under non-stationarity often leads to a progressive loss of plasticity, eventually limiting further learning. We define plasticity loss as the network’s diminishing ability to make reliable progress under gradient descent updates in all output-space directions, and identify departures from dynamical isometry (i.e., drift of the layer-wise Jacobian singular values away from one) as a key mechanism driving this loss. We first revisit a class of networks that are isometric almost-everywhere while remaining universal Lipschitz function approximators, demonstrating that isometry is compatible with expressive nonlinear function classes. Turning to more general architectures, we study an efficient isometry-promoting regularization scheme for continual learning. We analyze its interaction with common activation functions, and reveal a mechanism by which it can reactivate dead ReLU units. To integrate this regularization with adaptive optimization, we propose AdamO, an Adam-style optimizer that decouples isometric regularization from gradient updates, analogous to AdamW. Finally, we evaluate our methods in supervised and reinforcement learning settings designed to induce plasticity loss and show that they effectively preserve plasticity while also yielding strong performance.