Oral
in
Workshop: HiLD: High-dimensional Learning Dynamics Workshop
Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu
Abstract:
We consider the learning of a single-index target function f∗:Rd→R under spiked covariance data: f∗(x)=σ∗(1√1+θ⟨x,μ⟩), xi.i.d.∼N(0,Id+θμμ⊤), where the link function σ∗:R→R is a degree-p polynomial with information exponent k (defined as the lowest degree in the Hermite expansion of σ∗), and it depends on the projection of input x onto the spike (signal) direction μ∈Rd. In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n,d→∞,d/n→γ∈(0,∞), we ask the following question: how large should the spike magnitude θ (i.e., strength of the low-dimensional component) be, in order for (i) kernel methods, (ii) neural network trained with gradient descent, to learn f∗? We show that for kernel ridge regression, θ=Ω(d1−1p) is both sufficient and necessary. Whereas for GD-trained two-layer neural network, θ=Ω(d1−1k) suffices. Our result demonstrates that both kernel method and neural network benefit from low-dimensional structures in the data; moreover, since k≤p by definition, neural network can adapt to such structure more effectively.
Chat is not available.