On Non-local Convergence Analysis of Deep Linear Networks

Kun Chen · Dachao Lin · Zhihua Zhang

Hall E #223

Keywords: [ Optimization ] [ DL: Theory ] [ T: Deep Learning ] [ Deep Learning ]

[ Abstract ]
[ Poster [ Paper PDF
Wed 20 Jul 3:30 p.m. PDT — 5:30 p.m. PDT
Spotlight presentation: Deep Learning
Wed 20 Jul 7:30 a.m. PDT — 9 a.m. PDT


In this paper, we study the non-local convergence properties of deep linear networks. Specifically, under the quadratic loss, we consider optimizing deep linear networks in which there is at least a layer with only one neuron. We describe the convergent point of trajectories with an arbitrary balanced starting point under gradient flow, including the paths which converge to one of the saddle points. We also show specific convergence rates of trajectories that converge to the global minimizers by stages. We conclude that the rates vary from polynomial to linear. As far as we know, our results are the first to give a non-local analysis of deep linear neural networks with arbitrary balanced initialization, rather than the lazy training regime which has dominated the literature on neural networks or the restricted benign initialization.

Chat is not available.