Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Workshop on Advancing Neural Network Training : Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)

DASH: Warm-Starting Neural Network Training Without Loss of Plasticity Under Stationarity

Baekrok Shin · Junsoo Oh · Hanseul Cho · Chulhee Yun


Abstract:

Warm-starting neural networks by initializing them with previously learned weights is appealing, as practical neural networks are often deployed under a continuous influx of new data. However, this often leads to a phenomenon known as loss of plasticity, where the network loses its ability to learn new information and thereby shows worse generalization performance than those trained from scratch. While this issue has been actively studied in non-stationary data distributions, it surprisingly occurs even when the data distribution remains stationary, and its underlying mechanism is poorly understood. To address this gap, we develop a framework emulating real-world neural network training. Under this framework, we identify noise memorization as the primary cause of plasticity loss when warm-starting networks on stationary data. Motivated by this discovery, we propose Direction-Aware SHrinking (DASH), an effective method aiming to mitigate plasticity loss by selectively forgetting memorized noise while preserving learned features. We validate our approach on vision classification task, demonstrating consistent improvements in test accuracy and training efficiency.

Chat is not available.