Skip to yearly menu bar Skip to main content

Workshop: Over-parameterization: Pitfalls and Opportunities

On the Sparsity of Deep Neural Networks in the Overparameterized Regime: An Empirical Study

Rahul Parhi · Jack Wolf · Robert Nowak


Sparsity and low-rank structures have been incorporated into neural networks to reduce computational complexity and to improve generalization and robustness. Recent theoretical developments show that both are natural characteristics of data-fitting solutions cast in a new family of Banach spaces referred to as RBV2 spaces, the spaces of second-order bounded variation in the Radon domain. Moreover, sparse and deep ReLU networks are solutions to infinite dimensional variational problems in compositions of these spaces. This means that these learning problems can be recast as parametric optimizations over neural network weights. Remarkably, standard weight decay and variants correspond exactly to regularizing the RBV2-norm in the function space. Empirical validation in this paper confirm that weight decay leads to sparse and low-rank networks, as predicted by the theory.