Timezone: »
Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu
We consider the learning of a single-index target function $f_*: \mathbb{R}^d\to\mathbb{R}$ under spiked covariance data: $f_*(\boldsymbol{x}) = \textstyle\sigma_*(\frac{1}{\sqrt{1+\theta}}\langle\boldsymbol{x},\boldsymbol{\mu}\rangle)$, $\boldsymbol{x}\overset{\small\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\boldsymbol{I_d} + \theta\boldsymbol{\mu}\boldsymbol{\mu}^\top),$ where the link function $\sigma_*:\mathbb{R}\to\mathbb{R}$ is a degree-$p$ polynomial with information exponent $k$ (defined as the lowest degree in the Hermite expansion of $\sigma_*$), and it depends on the projection of input $\boldsymbol{x}$ onto the spike (signal) direction $\boldsymbol{\mu}\in\mathbb{R}^d$. In the proportional asymptotic limit where the number of training examples $n$ and the dimensionality $d$ jointly diverge: $n,d\to\infty, d/n\to\gamma\in(0,\infty)$, we ask the following question: how large should the spike magnitude $\theta$ (i.e., strength of the low-dimensional component) be, in order for $(i)$ kernel methods, $(ii)$ neural network trained with gradient descent, to learn $f_*$? We show that for kernel ridge regression, $\theta = \Omega\big(d^{1-\frac{1}{p}}\big)$ is both sufficient and necessary. Whereas for GD-trained two-layer neural network, $\theta=\Omega\big(d^{1-\frac{1}{k}}\big)$ suffices. Our result demonstrates that both kernel method and neural network benefit from low-dimensional structures in the data; moreover, since $k\le p$ by definition, neural network can adapt to such structure more effectively.
Author Information
Jimmy Ba (University of Toronto / xAI)
Murat Erdogdu (University of Toronto, Vector Institute)
Taiji Suzuki (The University of Tokyo / RIKEN)
Zhichao Wang (UC San Diego)
Denny Wu (University of Toronto)
More from the Same Authors
-
2021 : On Low Rank Training of Deep Neural Networks »
Siddhartha Kamalakara · Acyr Locatelli · Bharat Venkitesh · Jimmy Ba · Yarin Gal · Aidan Gomez -
2021 : Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings »
Shunshi Zhang · Murat Erdogdu · Animesh Garg -
2022 : You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2023 : Spectral Evolution and Invariance in Linear-width Neural Networks »
Zhichao Wang · Andrew Engel · Anand Sarwate · Ioana Dumitriu · Tony Chiang -
2023 : Benign Overfitting of Two-Layer Neural Networks under Inputs with Intrinsic Dimension »
Shunta Akiyama · Kazusato Oko · Taiji Suzuki -
2023 : Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective »
Wei Huang · Yuan Cao · Haonan Wang · Xin Cao · Taiji Suzuki -
2023 : Training on Thin Air: Improve Image Classification with Generated Data »
Yongchao Zhou · Hshmat Sahak · Jimmy Ba -
2023 : A Generative Model for Text Control in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 : Learning Green's Function Efficiently Using Low-Rank Approximations »
Kishan Wimalawarne · Taiji Suzuki · Sophie Langer -
2023 : Calibrating Language Models via Augmented Prompt Ensembles »
Mingjian Jiang · Yangjun Ruan · Sicong Huang · Saifei Liao · Silviu Pitis · Roger Grosse · Jimmy Ba -
2023 : Using Synthetic Data for Data Augmentation to Improve Classification Accuracy »
Yongchao Zhou · Hshmat Sahak · Jimmy Ba -
2023 : A Generative Model for Text Control in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 : Contributed talks 1 »
MENGQI LOU · Zhichao Wang -
2023 : Feature Learning in Two-layer Neural Networks under Structured Data, Murat A. Erdogdu »
Murat Erdogdu -
2023 Poster: DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning »
Tomoya Murata · Taiji Suzuki -
2023 Poster: Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems »
Atsushi Nitanda · Kazusato Oko · Denny Wu · Nobuhito Takenouchi · Taiji Suzuki -
2023 Poster: TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation »
Zhaoyan Liu · Noël Vouitsis · Satya Krishna Gorti · Jimmy Ba · Gabriel Loaiza-Ganem -
2023 Oral: Diffusion Models are Minimax Optimal Distribution Estimators »
Kazusato Oko · Shunta Akiyama · Taiji Suzuki -
2023 Poster: Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input »
Shokichi Takakura · Taiji Suzuki -
2023 Poster: Diffusion Models are Minimax Optimal Distribution Estimators »
Kazusato Oko · Shunta Akiyama · Taiji Suzuki -
2023 Poster: Tight and fast generalization error bound of graph embedding in metric space »
Atsushi Suzuki · Atsushi Nitanda · Taiji Suzuki · Jing Wang · Feng Tian · Kenji Yamanishi -
2021 Poster: On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting »
Shunta Akiyama · Taiji Suzuki -
2021 Spotlight: On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting »
Shunta Akiyama · Taiji Suzuki -
2021 Poster: Quantitative Understanding of VAE as a Non-linearly Scaled Isometric Embedding »
Akira Nakagawa · Keizo Kato · Taiji Suzuki -
2021 Poster: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2021 Spotlight: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2021 Spotlight: Quantitative Understanding of VAE as a Non-linearly Scaled Isometric Embedding »
Akira Nakagawa · Keizo Kato · Taiji Suzuki -
2021 Poster: Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning »
Tomoya Murata · Taiji Suzuki -
2021 Spotlight: Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning »
Tomoya Murata · Taiji Suzuki -
2020 Poster: Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning »
Silviu Pitis · Harris Chan · Stephen Zhao · Bradly Stadie · Jimmy Ba -
2020 Poster: Improving Transformer Optimization Through Better Initialization »
Xiao Shi Huang · Felipe Perez · Jimmy Ba · Maksims Volkovs -
2019 Poster: Approximation and non-parametric estimation of ResNet-type convolutional neural networks »
Kenta Oono · Taiji Suzuki -
2019 Oral: Approximation and non-parametric estimation of ResNet-type convolutional neural networks »
Kenta Oono · Taiji Suzuki -
2018 Poster: Functional Gradient Boosting based on Residual Network Perception »
Atsushi Nitanda · Taiji Suzuki -
2018 Oral: Functional Gradient Boosting based on Residual Network Perception »
Atsushi Nitanda · Taiji Suzuki