Local Redundancy: An Information-Theoretic Measure of Plasticity from Synthetic Memorization
Abstract
Plasticity—a neural network's ability to adapt to new tasks—is critical for continual and transfer learning. Existing measures, such as effective rank, dead neuron fraction, and weight norm, lack theoretical grounding and correlate poorly with performance on new tasks. We introduce local redundancy, an information-theoretic measure derived from universal compression theory. We define local redundancy as the worst-case redundancy of a local model family—parameters in an infinitesimal neighborhood along gradient directions—and show this is a principled measure of plasticity. Although local redundancy is intractable to compute exactly, we prove that the expected squared gradient norm on a synthetic memorization task provides an efficiently computable lower bound. Experiments on continual image classification and time series transfer learning demonstrate that local redundancy predicts downstream performance better than existing measures and enables pretraining checkpoint selection where validation loss plateaus.