Skip to yearly menu bar Skip to main content

Workshop: Over-parameterization: Pitfalls and Opportunities

On the Generalization Improvement from Neural Network Pruning

Tian Jin · Gintare Karolina Dziugaite · Michael Carbin


Even though the goal of pruning is often to reduce the computational resources consumed during training or inference, it comes as no surprise to theoreticians or practitioners that pruning also improves generalization. In this work, we empirically study pruning's effect on generalization, focusing on two state-of-the-art pruning algorithms: weight rewinding and learning-rate rewinding. However, each pruning algorithm is actually an aggregation of many design choices: a weight scoring heuristic, a pruning schedule, a learning rate schedule, among other factors, each of which might contribute to generalization improvement in different ways. We thus ablate each design choice to determine whether it is responsible for pruning's effect on generalization. We find that each individual contribution is limited compared to the generalization improvement achieved with the pruning algorithm in its entirety. Our results also highlight similarities and differences between the effects on generalization caused by pruning and model-width scaling.