ICML Linearized two-layers neural networks in high dimension

Invited talk
in
Workshop: Theoretical Physics for Deep Learning

Linearized two-layers neural networks in high dimension

Andrea Montanari

[ Abstract ]

[ Video]

2019 Invited talk
in
Workshop: Theoretical Physics for Deep Learning

Abstract:

Speaker: Andrea Montanari (Stanford)

Abstract: Abstract: We consider the problem of learning an unknown function f on the d-dimensional sphere with respect to the square loss, given i.i.d. samples (yi,xi) where xi is a feature vector uniformly distributed on the sphere and yi = f(x_i). We study two popular classes of models that can be regarded as linearizations of two-layers neural networks around a random initialization: (RF) The random feature model of Rahimi-Recht; (NT) The neural tangent kernel model of Jacot-Gabriel-Hongler. Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and hence enjoy universal approximation properties when the number of neurons N diverges, for a fixed dimension d.

We prove that, if both d and N are large, the behavior of these models is instead remarkably simpler. If N is of smaller order than d^2, then RF performs no better than linear regression with respect to the raw features xi, and NT performs no better than linear regression with respect to degree-one and two monomials in the xi's. More generally, if N is of smaller order than d^{k+1} then RF fits at most a degree-k polynomial in the raw features, and NT fits at most a degree-(k+ 1) polynomial. We then focus on the case of quadratic functions, and N= O(d). We show that the gap in generalization error between fully trained neural networks and the linearized models is potentially unbounded. [based on joint work with Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz]

Chat is not available.

Invited talk in Workshop: Theoretical Physics for Deep Learning

Linearized two-layers neural networks in high dimension

Andrea Montanari

Invited talk
in
Workshop: Theoretical Physics for Deep Learning