ICML Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

Oral
in
Workshop: HiLD: High-dimensional Learning Dynamics Workshop

Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu

[ Abstract ]

Abstract: We consider the learning of a single-index target function

$f_*: \mathbb{R}^d\to\mathbb{R}$ under spiked covariance data:

$f_*(\boldsymbol{x}) = \textstyle\sigma_*(\frac{1}{\sqrt{1+\theta}}\langle\boldsymbol{x},\boldsymbol{\mu}\rangle)$ ,

$\boldsymbol{x}\overset{\small\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\boldsymbol{I_d} + \theta\boldsymbol{\mu}\boldsymbol{\mu}^\top),$ where the link function

$\sigma_*:\mathbb{R}\to\mathbb{R}$ is a degree-

$p$ polynomial with information exponent

$k$ (defined as the lowest degree in the Hermite expansion of

$\sigma_*$ ), and it depends on the projection of input

$\boldsymbol{x}$ onto the spike (signal) direction

$\boldsymbol{\mu}\in\mathbb{R}^d$ . In the proportional asymptotic limit where the number of training examples

$n$ and the dimensionality

$d$ jointly diverge:

$n,d\to\infty, d/n\to\gamma\in(0,\infty)$ , we ask the following question: how large should the spike magnitude

$\theta$ (i.e., strength of the low-dimensional component) be, in order for

$(i)$ kernel methods,

$(ii)$ neural network trained with gradient descent, to learn

$f_*$ ? We show that for kernel ridge regression,

$\theta = \Omega\big(d^{1-\frac{1}{p}}\big)$ is both sufficient and necessary. Whereas for GD-trained two-layer neural network,

$\theta=\Omega\big(d^{1-\frac{1}{k}}\big)$ suffices. Our result demonstrates that both kernel method and neural network benefit from low-dimensional structures in the data; moreover, since

$k\le p$ by definition, neural network can adapt to such structure more effectively.

Chat is not available.

Oral in Workshop: HiLD: High-dimensional Learning Dynamics Workshop

Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu

Oral
in
Workshop: HiLD: High-dimensional Learning Dynamics Workshop