Invited talk
in
Workshop: Over-parameterization: Pitfalls and Opportunities
The generalization behavior of random feature and neural tangent models
Andrea Montanari
I consider two layer neural networks trained with square loss in the linear (lazy) regime. Under overparametrization, gradient descent converges to the minimum norm interpolant, and I consider this as well as the hole ridge regularization path. From a statistical viewpoint, these approaches are random features models, albeit of a special type. They are also equivalent to kernel ridge regression, with a random kernel of rank N*d (where N is the number of hidden neurons, and d the input dimension). I will describe a precise characterization of the generalization error when N, d and the sample size are polynomially related (and for covariates that are uniform on the d-dimensional sphere). I will then discuss the limitation of these approaches. I will explain how sparse random feature models can be learnt efficiently to try to address these limitations. [Based on joint work with Michael Celentano, Song Mei, Theodor Misiakiewicz, Yiqiao Zhong]