Workshop: 8th ICML Workshop on Automated Machine Learning (AutoML 2021)

Dynamic Pruning of a Neural Network via Gradient Signal-to-Noise Ratio

Julien Siems


While training highly overparameterized neural networks is common practice in deep learning, research into post-hoc weight-pruning suggests that more than 90% of parameters can be removed without loss in predictive performance. To save resources, zero-shot and one-shot pruning attempt to find such a sparse representation at initialization or at an early stage of training. Though efficient, there is no justification, why the sparsity structure should not change during training. Dynamic sparsity pruning undoes this limitation and allows to adapt the structure of the sparse neural network during training. Recent approaches rely on weight magnitude pruning, which has been shown to be sub-optimal when applied at earlier training stages. In this work we propose to use the gradient noise to make pruning decisions. The procedure enables us to automatically adjust the sparsity during training without imposing a hand-designed sparsity schedule, while at the same time being able to recover from previous pruning decisions by unpruning connections as necessary. We evaluate our new method on image and tabular datasets and demonstrate that we reach similar performance as the dense model from which extract the sparse network, while exposing less hyperparameters than other dynamic sparsity methods.