Timezone: »
Poster
Width Provably Matters in Optimization for Deep Linear Neural Networks
Simon Du · Wei Hu
We prove that for an $L$-layer fully-connected linear neural network, if the width of every hidden layer is $\widetilde{\Omega}\left(L \cdot r \cdot d_{out} \cdot \kappa^3 \right)$, where $r$ and $\kappa$ are the rank and the condition number of the input data, and $d_{out}$ is the output dimension, then gradient descent with Gaussian random initialization converges to a global minimum at a linear rate. The number of iterations to find an $\epsilon$-suboptimal solution is $O(\kappa \log(\frac{1}{\epsilon}))$. Our polynomial upper bound on the total running time for wide deep linear networks and the $\exp\left(\Omega\left(L\right)\right)$ lower bound for narrow deep linear neural networks [Shamir, 2018] together demonstrate that wide layers are necessary for optimizing deep models.
Author Information
Simon Du (Carnegie Mellon University)
Wei Hu (Princeton University)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Width Provably Matters in Optimization for Deep Linear Neural Networks »
Thu. Jun 13th 04:30 -- 04:35 PM Room Room 104
More from the Same Authors
-
2022 Poster: More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize »
Alexander Wei · Wei Hu · Jacob Steinhardt -
2022 Spotlight: More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize »
Alexander Wei · Wei Hu · Jacob Steinhardt -
2021 Poster: A Representation Learning Perspective on the Importance of Train-Validation Splitting in Meta-Learning »
Nikunj Umesh Saunshi · Arushi Gupta · Wei Hu -
2021 Spotlight: A Representation Learning Perspective on the Importance of Train-Validation Splitting in Meta-Learning »
Nikunj Umesh Saunshi · Arushi Gupta · Wei Hu -
2021 Poster: Near-Optimal Linear Regression under Distribution Shift »
Qi Lei · Wei Hu · Jason Lee -
2021 Spotlight: Near-Optimal Linear Regression under Distribution Shift »
Qi Lei · Wei Hu · Jason Lee -
2019 Poster: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Poster: Gradient Descent Finds Global Minima of Deep Neural Networks »
Simon Du · Jason Lee · Haochuan Li · Liwei Wang · Xiyu Zhai -
2019 Poster: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Ruosong Wang -
2019 Oral: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Oral: Gradient Descent Finds Global Minima of Deep Neural Networks »
Simon Du · Jason Lee · Haochuan Li · Liwei Wang · Xiyu Zhai -
2019 Oral: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Ruosong Wang -
2018 Poster: On the Power of Over-parametrization in Neural Networks with Quadratic Activation »
Simon Du · Jason Lee -
2018 Poster: Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow »
Xiao Zhang · Simon Du · Quanquan Gu -
2018 Oral: Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow »
Xiao Zhang · Simon Du · Quanquan Gu -
2018 Oral: On the Power of Over-parametrization in Neural Networks with Quadratic Activation »
Simon Du · Jason Lee -
2018 Poster: Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima »
Simon Du · Jason Lee · Yuandong Tian · Aarti Singh · Barnabás Póczos -
2018 Poster: Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms »
Yi Wu · Siddharth Srivastava · Nicholas Hay · Simon Du · Stuart Russell -
2018 Oral: Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms »
Yi Wu · Siddharth Srivastava · Nicholas Hay · Simon Du · Stuart Russell -
2018 Oral: Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima »
Simon Du · Jason Lee · Yuandong Tian · Aarti Singh · Barnabás Póczos -
2017 Poster: Stochastic Variance Reduction Methods for Policy Evaluation »
Simon Du · Jianshu Chen · Lihong Li · Lin Xiao · Dengyong Zhou -
2017 Talk: Stochastic Variance Reduction Methods for Policy Evaluation »
Simon Du · Jianshu Chen · Lihong Li · Lin Xiao · Dengyong Zhou