Skip to yearly menu bar Skip to main content


An Investigation into Neural Net Optimization via Hessian Eigenvalue Density

Behrooz Ghorbani · Shankar Krishnan · Ying Xiao

Pacific Ballroom #51

Keywords: [ Optimization ] [ Non-convex Optimization ] [ Algorithms ]


To understand the dynamics of training in deep neural networks, we study the evolution of the Hessian eigenvalue density throughout the optimization process. In non-batch normalized networks, we observe the rapid appearance of large isolated eigenvalues in the spectrum, along with a surprising concentration of the gradient in the corresponding eigenspaces. In a batch normalized network, these two effects are almost absent. We give a theoretical rationale to partially explain these phenomena. As part of this work, we adapt advanced tools from numerical linear algebra that allow scalable and accurate estimation of the entire Hessian spectrum of ImageNet-scale neural networks; this technique may be of independent interest in other applications.

Live content is unavailable. Log in and register to view live content