Timezone: »

Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent
Surbhi Goel · Aravind Gollakota · Zhihan Jin · Sushrut Karmalkar · Adam Klivans

Wed Jul 15 08:00 AM -- 08:45 AM & Wed Jul 15 07:00 PM -- 07:45 PM (PDT) @ Virtual

We give the first superpolynomial lower bounds for learning one-layer neural networks with respect to the Gaussian distribution for a broad class of algorithms. In the regression setting, we prove that gradient descent run on any classifier with respect to square loss will fail to achieve small test error in polynomial time. Prior work held only for gradient descent run with small batch sizes and sufficiently smooth classifiers. For classification, we give a stronger result, namely that any statistical query (SQ) algorithm will fail to achieve small test error in polynomial time. Our lower bounds hold for commonly used activations such as ReLU and sigmoid. The core of our result relies on a novel construction of a simple family of neural networks that are exactly orthogonal with respect to all spherically symmetric distributions.

Author Information

Surbhi Goel (University of Texas at Austin)
Aravind Gollakota (University of Texas at Austin)
Zhihan Jin (Shanghai Jiao Tong University)
Sushrut Karmalkar (University of Texas at Austin)
Adam Klivans (University of Texas at Austin)

More from the Same Authors