Skip to yearly menu bar Skip to main content


Poster
in
Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning

Closed form of the Hessian spectrum for some Neural Networks

Sidak Pal Singh · Thomas Hofmann


Abstract:

The Hessian matrix and its spectrum hold significant theoretical and practical relevance as they capture the pairwise interaction of the parameters, and as a result have been widely used in building preconditioned optimizers, measuring generalization performance, studying the effect of learning rate and other hyperparameters, optimally pruning parameters, and more. Given its versatility and importance, several prior works have tried to characterize the Hessian spectrum through its spectral density, rank, description of the outlier and bulk in its spectrum, while often resorting to random matrix theory based approximations. However, grasping how the top eigenvalue precisely behaves has remained unclear, let alone the corresponding eigenvectors, due to a lack of its closed form for any non-trivial class of neural networks. Likewise, given the acute costs required to empirically estimate the Hessian or its various spectral measures (such as the top eigenvalue, trace, and determinant), our understanding of their behaviour continues to be somewhat muddled. In this work, we derive a closed form of all the eigenvalues and their corresponding eigenvectors for one-hidden layer, linear as well as ReLU, uni-dimensional networks with arbitrary hidden-layer width and for the loss aggregated over any number of samples. As a consequence of these theoretical results, we shed light on the previously undiscovered `paired' nature of the spectrum outlier eigenvalues, the grouped composition of the trace, and a cell-wise decomposition of the Hessian spectrum with ReLU.

Chat is not available.