Timezone: »
Oral
A Convergence Theory for Deep Learning via Over-Parameterization
Zeyuan Allen-Zhu · Yuanzhi Li · Zhao Song
Deep neural networks (DNNs) have demonstrated dominating performance in many fields; since AlexNet, the neural networks used in practice are going wider and deeper. On the theoretical side, a long line of works have been focusing on why we can train neural networks when there is only one hidden layer. The theory of multi-layer networks remains somewhat unsettled.
In this work, we prove why simple algorithms such as stochastic gradient descent (SGD) can find GLOBAL MINIMA on the training objective of DNNs in POLYNOMIAL TIME. We only make two assumptions: the inputs do not degenerate and the network is over-parameterized. The latter means the number of hidden neurons is sufficiently large: polynomial in $L$, the number of DNN layers and in $n$, the number of training samples.
As concrete examples, on the training set and starting from randomly initialized weights, we show that SGD attains 100\% accuracy in classification tasks, or minimizes regression loss in linear convergence speed $\varepsilon \propto e^{-\Omega(T)}$, with a number of iterations that only scales polynomial in $n$ and $L$. Our theory applies to the widely-used but non-smooth ReLU activation, and to any smooth and possibly non-convex loss functions. In terms of network architectures, our theory at least applies to fully-connected neural networks, convolutional neural networks (CNN), and residual neural networks (ResNet).
Author Information
Zeyuan Allen-Zhu (Microsoft Research AI)
Yuanzhi Li (Stanford)
Zhao Song (UT-Austin)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: A Convergence Theory for Deep Learning via Over-Parameterization »
Fri Jun 14th 01:30 -- 04:00 AM Room Pacific Ballroom
More from the Same Authors
-
2018 Poster: Learning long term dependencies via Fourier recurrent units »
Jiong Zhang · Yibo Lin · Zhao Song · Inderjit Dhillon -
2018 Poster: Towards Fast Computation of Certified Robustness for ReLU Networks »
Tsui-Wei Weng · Huan Zhang · Hongge Chen · Zhao Song · Cho-Jui Hsieh · Luca Daniel · Duane Boning · Inderjit Dhillon -
2018 Poster: Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits »
Zeyuan Allen-Zhu · Sebastien Bubeck · Yuanzhi Li -
2018 Oral: Towards Fast Computation of Certified Robustness for ReLU Networks »
Tsui-Wei Weng · Huan Zhang · Hongge Chen · Zhao Song · Cho-Jui Hsieh · Luca Daniel · Duane Boning · Inderjit Dhillon -
2018 Oral: Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits »
Zeyuan Allen-Zhu · Sebastien Bubeck · Yuanzhi Li -
2018 Oral: Learning long term dependencies via Fourier recurrent units »
Jiong Zhang · Yibo Lin · Zhao Song · Inderjit Dhillon -
2018 Poster: Katyusha X: Simple Momentum Method for Stochastic Sum-of-Nonconvex Optimization »
Zeyuan Allen-Zhu -
2018 Oral: Katyusha X: Simple Momentum Method for Stochastic Sum-of-Nonconvex Optimization »
Zeyuan Allen-Zhu -
2017 Poster: Near-Optimal Design of Experiments via Regret Minimization »
Zeyuan Allen-Zhu · Yuanzhi Li · Aarti Singh · Yining Wang -
2017 Talk: Near-Optimal Design of Experiments via Regret Minimization »
Zeyuan Allen-Zhu · Yuanzhi Li · Aarti Singh · Yining Wang -
2017 Poster: Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition »
Zeyuan Allen-Zhu · Yuanzhi Li -
2017 Poster: Faster Principal Component Regression and Stable Matrix Chebyshev Approximation »
Zeyuan Allen-Zhu · Yuanzhi Li -
2017 Poster: Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter »
Zeyuan Allen-Zhu -
2017 Talk: Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter »
Zeyuan Allen-Zhu -
2017 Talk: Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition »
Zeyuan Allen-Zhu · Yuanzhi Li -
2017 Talk: Faster Principal Component Regression and Stable Matrix Chebyshev Approximation »
Zeyuan Allen-Zhu · Yuanzhi Li -
2017 Poster: Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU »
Zeyuan Allen-Zhu · Yuanzhi Li -
2017 Poster: Recovery Guarantees for One-hidden-layer Neural Networks »
Kai Zhong · Zhao Song · Prateek Jain · Peter Bartlett · Inderjit Dhillon -
2017 Talk: Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU »
Zeyuan Allen-Zhu · Yuanzhi Li -
2017 Talk: Recovery Guarantees for One-hidden-layer Neural Networks »
Kai Zhong · Zhao Song · Prateek Jain · Peter Bartlett · Inderjit Dhillon -
2017 Tutorial: Recent Advances in Stochastic Convex and Non-Convex Optimization »
Zeyuan Allen-Zhu