Timezone: »
Poster
An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis
Yuandong Tian
In this paper, we explore theoretical properties of training a two-layered ReLU network $g(\vx; \vw) = \sum_{j=1}^K \sigma(\vw_j\trans\vx)$ with centered $d$-dimensional spherical Gaussian input $\vx$ ($\sigma$=ReLU). We train our network with gradient descent on $\vw$ to mimic the output of a teacher network with the same architecture and fixed parameters $\vw\opt$. We show that its population gradient has an analytical formula, leading to interesting theoretical analysis of critical points and convergence behaviors. First, we prove that critical points outside the hyperplane spanned by the teacher parameters (``out-of-plane``) are not isolated and form manifolds, and characterize in-plane critical-point-free regions for two-ReLU case. On the other hand, convergence to $\vw\opt$ for one ReLU node is guaranteed with at least $(1-\epsilon)/2$ probability, if weights are initialized randomly with standard deviation upper-bounded by $O(\epsilon/\sqrt{d})$, in accordance with empirical practice. For network with many ReLU nodes, we prove that an infinitesimal perturbation of weight initialization results in convergence towards $\vw\opt$ (or its permutation), a phenomenon known as spontaneous symmetric-breaking (SSB) in physics. We assume no independence of ReLU activations. Simulation verifies our findings.
Author Information
Yuandong Tian (Facebook AI Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Talk: An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis »
Tue. Aug 8th 04:06 -- 04:24 AM Room Parkside 2
More from the Same Authors
-
2021 : Learning Space Partitions for Path Planning »
Kevin Yang · Tianjun Zhang · Chris Cummins · Brandon Cui · Benoit Steiner · Linnan Wang · Joseph E Gonzalez · Dan Klein · Yuandong Tian -
2022 Poster: Denoised MDPs: Learning World Models Better Than the World Itself »
Tongzhou Wang · Simon Du · Antonio Torralba · Phillip Isola · Amy Zhang · Yuandong Tian -
2022 Spotlight: Denoised MDPs: Learning World Models Better Than the World Itself »
Tongzhou Wang · Simon Du · Antonio Torralba · Phillip Isola · Amy Zhang · Yuandong Tian -
2021 : RL + Operations Research Panel »
Jim Dai · Fei Fang · Shie Mannor · Yuandong Tian · Zhiwei (Tony) Qin · Zongqing Lu -
2021 Poster: Learn-to-Share: A Hardware-friendly Transfer Learning Framework Exploiting Computation and Parameter Sharing »
Cheng Fu · Hanxian Huang · Xinyun Chen · Yuandong Tian · Jishen Zhao -
2021 Oral: Learn-to-Share: A Hardware-friendly Transfer Learning Framework Exploiting Computation and Parameter Sharing »
Cheng Fu · Hanxian Huang · Xinyun Chen · Yuandong Tian · Jishen Zhao -
2021 Poster: Understanding self-supervised learning dynamics without contrastive pairs »
Yuandong Tian · Xinlei Chen · Surya Ganguli -
2021 Oral: Understanding self-supervised learning dynamics without contrastive pairs »
Yuandong Tian · Xinlei Chen · Surya Ganguli -
2021 Poster: Few-Shot Neural Architecture Search »
Yiyang Zhao · Linnan Wang · Yuandong Tian · Rodrigo Fonseca · Tian Guo -
2021 Oral: Few-Shot Neural Architecture Search »
Yiyang Zhao · Linnan Wang · Yuandong Tian · Rodrigo Fonseca · Tian Guo -
2020 Poster: Student Specialization in Deep Rectified Networks With Finite Width and Input Dimension »
Yuandong Tian -
2019 Poster: ELF OpenGo: an analysis and open reimplementation of AlphaZero »
Yuandong Tian · Jerry Ma · Qucheng Gong · Shubho Sengupta · Zhuoyuan Chen · James Pinkerton · Larry Zitnick -
2019 Oral: ELF OpenGo: an analysis and open reimplementation of AlphaZero »
Yuandong Tian · Jerry Ma · Qucheng Gong · Shubho Sengupta · Zhuoyuan Chen · James Pinkerton · Larry Zitnick -
2018 Poster: Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima »
Simon Du · Jason Lee · Yuandong Tian · Aarti Singh · Barnabás Póczos -
2018 Oral: Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima »
Simon Du · Jason Lee · Yuandong Tian · Aarti Singh · Barnabás Póczos