Poster
in
Workshop: Over-parameterization: Pitfalls and Opportunities
On the Sparsity of Deep Neural Networks in the Overparameterized Regime: An Empirical Study
Rahul Parhi · Jack Wolf · Robert Nowak
Sparsity and low-rank structures have been incorporated into neural networks to reduce computational complexity and to improve generalization and robustness. Recent theoretical developments show that both are natural characteristics of data-fitting solutions cast in a new family of Banach spaces referred to as RBV2 spaces, the spaces of second-order bounded variation in the Radon domain. Moreover, sparse and deep ReLU networks are solutions to infinite dimensional variational problems in compositions of these spaces. This means that these learning problems can be recast as parametric optimizations over neural network weights. Remarkably, standard weight decay and variants correspond exactly to regularizing the RBV2-norm in the function space. Empirical validation in this paper confirm that weight decay leads to sparse and low-rank networks, as predicted by the theory.