Timezone: »
We give the first superpolynomial lower bounds for learning one-layer neural networks with respect to the Gaussian distribution for a broad class of algorithms. In the regression setting, we prove that gradient descent run on any classifier with respect to square loss will fail to achieve small test error in polynomial time. Prior work held only for gradient descent run with small batch sizes and sufficiently smooth classifiers. For classification, we give a stronger result, namely that any statistical query (SQ) algorithm will fail to achieve small test error in polynomial time. Our lower bounds hold for commonly used activations such as ReLU and sigmoid. The core of our result relies on a novel construction of a simple family of neural networks that are exactly orthogonal with respect to all spherically symmetric distributions.
Author Information
Surbhi Goel (University of Texas at Austin)
Aravind Gollakota (University of Texas at Austin)
Zhihan Jin (Shanghai Jiao Tong University)
Sushrut Karmalkar (University of Texas at Austin)
Adam Klivans (University of Texas at Austin)
More from the Same Authors
-
2022 Poster: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Spotlight: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Poster: Inductive Biases and Variable Creation in Self-Attention Mechanisms »
Benjamin Edelman · Surbhi Goel · Sham Kakade · Cyril Zhang -
2022 Spotlight: Inductive Biases and Variable Creation in Self-Attention Mechanisms »
Benjamin Edelman · Surbhi Goel · Sham Kakade · Cyril Zhang -
2021 Poster: Statistical Estimation from Dependent Data »
Vardis Kandiros · Yuval Dagan · Nishanth Dikkala · Surbhi Goel · Constantinos Daskalakis -
2021 Spotlight: Statistical Estimation from Dependent Data »
Vardis Kandiros · Yuval Dagan · Nishanth Dikkala · Surbhi Goel · Constantinos Daskalakis -
2021 Poster: Fairness for Image Generation with Uncertain Sensitive Attributes »
Ajil Jalal · Sushrut Karmalkar · Jessica Hoffmann · Alexandros Dimakis · Eric Price -
2021 Spotlight: Fairness for Image Generation with Uncertain Sensitive Attributes »
Ajil Jalal · Sushrut Karmalkar · Jessica Hoffmann · Alexandros Dimakis · Eric Price -
2021 Poster: Instance-Optimal Compressed Sensing via Posterior Sampling »
Ajil Jalal · Sushrut Karmalkar · Alexandros Dimakis · Eric Price -
2021 Spotlight: Instance-Optimal Compressed Sensing via Posterior Sampling »
Ajil Jalal · Sushrut Karmalkar · Alexandros Dimakis · Eric Price -
2021 Poster: Acceleration via Fractal Learning Rate Schedules »
Naman Agarwal · Surbhi Goel · Cyril Zhang -
2021 Spotlight: Acceleration via Fractal Learning Rate Schedules »
Naman Agarwal · Surbhi Goel · Cyril Zhang -
2020 Poster: On the Power of Compressed Sensing with Generative Models »
Akshay Kamath · Eric Price · Sushrut Karmalkar -
2020 Poster: Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection »
Mao Ye · Chengyue Gong · Lizhen Nie · Denny Zhou · Adam Klivans · Qiang Liu -
2020 Poster: Learning Mixtures of Graphs from Epidemic Cascades »
Jessica Hoffmann · Soumya Basu · Surbhi Goel · Constantine Caramanis -
2020 Poster: Efficiently Learning Adversarially Robust Halfspaces with Noise »
Omar Montasser · Surbhi Goel · Ilias Diakonikolas · Nati Srebro -
2018 Poster: Learning One Convolutional Layer with Overlapping Patches »
Surbhi Goel · Adam Klivans · Raghu Meka -
2018 Oral: Learning One Convolutional Layer with Overlapping Patches »
Surbhi Goel · Adam Klivans · Raghu Meka -
2017 Poster: Exact MAP Inference by Avoiding Fractional Vertices »
Erik Lindgren · Alexandros Dimakis · Adam Klivans -
2017 Talk: Exact MAP Inference by Avoiding Fractional Vertices »
Erik Lindgren · Alexandros Dimakis · Adam Klivans