Spotlight
in
Workshop: Understanding and Improving Generalization in Deep Learning
Overparameterization without Overfitting: Jacobian-based Generalization Guarantees for Neural Networks
Authors: Samet Oymak, Mingchen Li, Zalan Fabian and Mahdi Soltanolkotabi
Abstract: Many modern neural network architectures contain many more parameters than the size of the training data. Such networks can easily overfit to training data, hence it is crucial to understand the fundamental principles that facilitate good test accuracy. This paper explores the generalization capabilities of neural networks trained via gradient descent. We show that the Jacobian matrix associated with the network dictates the directions where learning is generalizable and fast versus directions where overfitting occurs and learning is slow. We develop a bias-variance theory which provides a control knob to split the Jacobian spectum into information" and
nuisance" spaces associated with the large and small singular values of the Jacobian. We show that (i) over the information space learning is fast and we can quickly train a model with zero training loss that can also generalize well, (ii) over the nuisance subspace overfitting might result in higher variance hence early stopping can help with generalization at the expense of some bias. We conduct numerical experiments on deep networks that corroborate out theory and demonstrate that: (i) the Jacobian of typical networks exhibit a bimodal structure with a few large singular values and many small ones leading to a low-dimensional information space (ii) most of the useful information lies on the information space where learning happens quickly.