Skip to yearly menu bar Skip to main content


Poster
in
Workshop: HiLD: High-dimensional Learning Dynamics Workshop

The phases of large learning rate gradient descent through effective parameters

Lawrence Wang · Stephen Roberts


Abstract:

Modern neural networks are undeniably successful. Numerous works study how the curvature of loss landscapes can affect the quality of solutions. In this work we consider the Hessian matrix during network training with large learning rates, which represents an attractive learning regime but is (in)famously unstable. Through the connection between “well-determined” or “effective” parameters to the performance of neural nets, we study the instabilities of gradient descent, and we characterise the phases of gradient descent instabilities in these regimes. With a connection to the loss basin, we observe different regimes of Hessian rotation during these instabilities and a general tendency for solution flattening.

Chat is not available.