Timezone: »
Of theories for why large-scale machine learning models generalize despite being vastly overparameterized, which of their assumptions are needed to capture the qualitative phenomena of generalization in the real world? On one hand, we find that most theoretical analyses fall short of capturing these qualitative phenomena even for kernel regression, when applied to kernels derived from large-scale neural networks (e.g., ResNet-50) and real data (e.g., CIFAR-100). On the other hand, we find that the classical GCV estimator (Craven and Wahba, 1978) accurately predicts generalization risk even in such overparameterized settings. To bolster this empirical finding, we prove that the GCV estimator converges to the generalization risk whenever a local random matrix law holds. Finally, we apply this random matrix theory lens to explain why pretrained representations generalize better as well as what factors govern scaling laws for kernel regression. Our findings suggest that random matrix theory, rather than just being a toy model, may be central to understanding the properties of neural representations in practice.
Author Information
Alexander Wei (UC Berkeley)
Wei Hu (UC Berkeley / UMich)
Jacob Steinhardt (UC Berkeley)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize »
Wed. Jul 20th through Thu the 21st Room Hall E #320
More from the Same Authors
-
2022 : Distribution Shift Through the Lens of Explanations »
Jacob Steinhardt -
2022 Poster: Scaling Out-of-Distribution Detection for Real-World Settings »
Dan Hendrycks · Steven Basart · Mantas Mazeika · Andy Zou · joseph kwon · Mohammadreza Mostajabi · Jacob Steinhardt · Dawn Song -
2022 Poster: Predicting Out-of-Distribution Error with the Projection Norm »
Yaodong Yu · Zitong Yang · Alexander Wei · Yi Ma · Jacob Steinhardt -
2022 Spotlight: Scaling Out-of-Distribution Detection for Real-World Settings »
Dan Hendrycks · Steven Basart · Mantas Mazeika · Andy Zou · joseph kwon · Mohammadreza Mostajabi · Jacob Steinhardt · Dawn Song -
2022 Spotlight: Predicting Out-of-Distribution Error with the Projection Norm »
Yaodong Yu · Zitong Yang · Alexander Wei · Yi Ma · Jacob Steinhardt -
2022 Poster: Describing Differences between Text Distributions with Natural Language »
Ruiqi Zhong · Charlie Snell · Dan Klein · Jacob Steinhardt -
2022 Spotlight: Describing Differences between Text Distributions with Natural Language »
Ruiqi Zhong · Charlie Snell · Dan Klein · Jacob Steinhardt -
2021 Poster: A Representation Learning Perspective on the Importance of Train-Validation Splitting in Meta-Learning »
Nikunj Umesh Saunshi · Arushi Gupta · Wei Hu -
2021 Spotlight: A Representation Learning Perspective on the Importance of Train-Validation Splitting in Meta-Learning »
Nikunj Umesh Saunshi · Arushi Gupta · Wei Hu -
2021 Poster: Near-Optimal Linear Regression under Distribution Shift »
Qi Lei · Wei Hu · Jason Lee -
2021 Spotlight: Near-Optimal Linear Regression under Distribution Shift »
Qi Lei · Wei Hu · Jason Lee -
2020 Poster: Rethinking Bias-Variance Trade-off for Generalization of Neural Networks »
Zitong Yang · Yaodong Yu · Chong You · Jacob Steinhardt · Yi Ma -
2019 Poster: Width Provably Matters in Optimization for Deep Linear Neural Networks »
Simon Du · Wei Hu -
2019 Oral: Width Provably Matters in Optimization for Deep Linear Neural Networks »
Simon Du · Wei Hu -
2019 Poster: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Ruosong Wang -
2019 Oral: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Ruosong Wang