Processing math: 100%
Skip to yearly menu bar Skip to main content


Oral
in
Workshop: HiLD: High-dimensional Learning Dynamics Workshop

Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu


Abstract: We consider the learning of a single-index target function f:RdR under spiked covariance data: f(x)=σ(11+θx,μ), xi.i.d.N(0,Id+θμμ), where the link function σ:RR is a degree-p polynomial with information exponent k (defined as the lowest degree in the Hermite expansion of σ), and it depends on the projection of input x onto the spike (signal) direction μRd. In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n,d,d/nγ(0,), we ask the following question: how large should the spike magnitude θ (i.e., strength of the low-dimensional component) be, in order for (i) kernel methods, (ii) neural network trained with gradient descent, to learn f? We show that for kernel ridge regression, θ=Ω(d11p) is both sufficient and necessary. Whereas for GD-trained two-layer neural network, θ=Ω(d11k) suffices. Our result demonstrates that both kernel method and neural network benefit from low-dimensional structures in the data; moreover, since kp by definition, neural network can adapt to such structure more effectively.

Chat is not available.