Timezone: »
Oral
Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima
Simon Du · Jason Lee · Yuandong Tian · Aarti Singh · Barnabás Póczos
We consider the problem of learning an one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation function, i.e., $f(Z; w, a) = \sum_j a_j\sigma(w^\top Z_j)$, in which both the convolutional weights $w$ and the output weights $a$ are parameters to be learned. We prove that with Gaussian input $\vZ$, there is a spurious local minimizer. Surprisingly, in the presence of the spurious local minimizer, starting from randomly initialized weights, gradient descent with weight normalization can still be proven to recover the true parameters with constant probability (which can be boosted to probability $1$ with multiple restarts). We also show that with constant probability, the same procedure could also converge to the spurious local minimum, showing that the local minimum plays a non-trivial role in the dynamics of gradient descent. Furthermore, a quantitative analysis shows that the gradient descent dynamics has two phases: it starts off slow, but converges much faster after several iterations.
Author Information
Simon Du (Carnegie Mellon University)
Jason Lee (University of Southern California)
Yuandong Tian (Facebook AI Research)
Aarti Singh (Carnegie Mellon University)
Barnabás Póczos (CMU)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima »
Thu Jul 12th 04:15 -- 07:00 PM Room Hall B
More from the Same Authors
-
2020 Poster: VideoOneNet: Bidirectional Convolutional Recurrent OneNet with Trainable Data Steps for Video Processing »
Zoltán Á. Milacski · Barnabás Póczos · Andras Lorincz -
2020 Poster: Student Specialization in Deep Rectified Networks With Finite Width and Input Dimension »
Yuandong Tian -
2019 Poster: Width Provably Matters in Optimization for Deep Linear Neural Networks »
Simon Du · Wei Hu -
2019 Oral: Width Provably Matters in Optimization for Deep Linear Neural Networks »
Simon Du · Wei Hu -
2019 Poster: Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments »
Kirthevasan Kandasamy · Willie Neiswanger · Reed Zhang · Akshay Krishnamurthy · Jeff Schneider · Barnabás Póczos -
2019 Oral: Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments »
Kirthevasan Kandasamy · Willie Neiswanger · Reed Zhang · Akshay Krishnamurthy · Jeff Schneider · Barnabás Póczos -
2019 Poster: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Poster: Gradient Descent Finds Global Minima of Deep Neural Networks »
Simon Du · Jason Lee · Haochuan Li · Liwei Wang · Xiyu Zhai -
2019 Poster: ELF OpenGo: an analysis and open reimplementation of AlphaZero »
Yuandong Tian · Jerry Ma · Qucheng Gong · Shubho Sengupta · Zhuoyuan Chen · James Pinkerton · Larry Zitnick -
2019 Poster: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Ruosong Wang -
2019 Poster: Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models »
Mor Shpigel Nacson · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2019 Oral: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Oral: Gradient Descent Finds Global Minima of Deep Neural Networks »
Simon Du · Jason Lee · Haochuan Li · Liwei Wang · Xiyu Zhai -
2019 Oral: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Ruosong Wang -
2019 Oral: Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models »
Mor Shpigel Nacson · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2019 Oral: ELF OpenGo: an analysis and open reimplementation of AlphaZero »
Yuandong Tian · Jerry Ma · Qucheng Gong · Shubho Sengupta · Zhuoyuan Chen · James Pinkerton · Larry Zitnick -
2018 Poster: On the Power of Over-parametrization in Neural Networks with Quadratic Activation »
Simon Du · Jason Lee -
2018 Poster: Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow »
Xiao Zhang · Simon Du · Quanquan Gu -
2018 Poster: Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solution for Nonconvex Distributed Optimization Over Networks »
Mingyi Hong · Meisam Razaviyayn · Jason Lee -
2018 Poster: Transformation Autoregressive Networks »
Junier Oliva · Kumar Avinava Dubey · Manzil Zaheer · Barnabás Póczos · Ruslan Salakhutdinov · Eric Xing · Jeff Schneider -
2018 Oral: Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow »
Xiao Zhang · Simon Du · Quanquan Gu -
2018 Oral: On the Power of Over-parametrization in Neural Networks with Quadratic Activation »
Simon Du · Jason Lee -
2018 Oral: Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solution for Nonconvex Distributed Optimization Over Networks »
Mingyi Hong · Meisam Razaviyayn · Jason Lee -
2018 Oral: Transformation Autoregressive Networks »
Junier Oliva · Kumar Avinava Dubey · Manzil Zaheer · Barnabás Póczos · Ruslan Salakhutdinov · Eric Xing · Jeff Schneider -
2018 Poster: Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms »
Yi Wu · Siddharth Srivastava · Nicholas Hay · Simon Du · Stuart Russell -
2018 Poster: Characterizing Implicit Bias in Terms of Optimization Geometry »
Suriya Gunasekar · Jason Lee · Daniel Soudry · Nati Srebro -
2018 Oral: Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms »
Yi Wu · Siddharth Srivastava · Nicholas Hay · Simon Du · Stuart Russell -
2018 Oral: Characterizing Implicit Bias in Terms of Optimization Geometry »
Suriya Gunasekar · Jason Lee · Daniel Soudry · Nati Srebro -
2017 Poster: Multi-fidelity Bayesian Optimisation with Continuous Approximations »
kirthevasan kandasamy · Gautam Dasarathy · Barnabás Póczos · Jeff Schneider -
2017 Talk: Multi-fidelity Bayesian Optimisation with Continuous Approximations »
kirthevasan kandasamy · Gautam Dasarathy · Barnabás Póczos · Jeff Schneider -
2017 Poster: An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis »
Yuandong Tian -
2017 Poster: Stochastic Variance Reduction Methods for Policy Evaluation »
Simon Du · Jianshu Chen · Lihong Li · Lin Xiao · Dengyong Zhou -
2017 Poster: The Statistical Recurrent Unit »
Junier Oliva · Barnabás Póczos · Jeff Schneider -
2017 Poster: Nonparanormal Information Estimation »
Shashank Singh · Barnabás Póczos -
2017 Talk: Nonparanormal Information Estimation »
Shashank Singh · Barnabás Póczos -
2017 Talk: An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis »
Yuandong Tian -
2017 Talk: The Statistical Recurrent Unit »
Junier Oliva · Barnabás Póczos · Jeff Schneider -
2017 Talk: Stochastic Variance Reduction Methods for Policy Evaluation »
Simon Du · Jianshu Chen · Lihong Li · Lin Xiao · Dengyong Zhou -
2017 Poster: Equivariance Through Parameter-Sharing »
Siamak Ravanbakhsh · Jeff Schneider · Barnabás Póczos -
2017 Talk: Equivariance Through Parameter-Sharing »
Siamak Ravanbakhsh · Jeff Schneider · Barnabás Póczos