ICML Poster Global curvature for second-order optimization of neural networks

Poster

Global curvature for second-order optimization of neural networks

Alberto Bernacchia

East Exhibition Hall A-B #E-2112

[ Abstract ] [ Lay Summary ] [ Project Page ]

[ Slides] [ Poster] [ OpenReview]

Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: Second-order optimization methods, which leverage the local curvature of the loss function, have the potential to dramatically accelerate the training of machine learning models. However, these methods are often hindered by the computational burden of constructing and inverting large curvature matrices with $\mathcal{O}(p^2)$ elements, where $p$ is the number of parameters. In this work, we present a theory that predicts the \emph{exact} structure of the global curvature by leveraging the intrinsic symmetries of neural networks, such as invariance under parameter permutations. For Multi-Layer Perceptrons (MLPs), our approach reveals that the global curvature can be expressed in terms of $\mathcal{O}(d^2 + L^2)$ independent factors, where $d$ is the number of input/output dimensions and $L$ is the number of layers, significantly reducing the computational burden compared to the $\mathcal{O}(p^2)$ elements of the full matrix. These factors can be estimated efficiently, enabling precise curvature computations.To evaluate the practical implications of our framework, we apply second-order optimization to synthetic data, achieving markedly faster convergence compared to traditional optimization methods.Our findings pave the way for a better understanding of the loss landscape of neural networks, and for designing more efficient training methodologies in deep learning.Code: \href{https://github.com/mtkresearch/symo_notebooks}{github.com/mtkresearch/symo\_notebooks}

Lay Summary:

Training machine learning models can be slow, but advanced optimization techniques that use the "curvature" (or shape) of the loss function could speed things up. However, these methods usually require a lot of computation. In this work, we discovered that the built-in symmetries of neural networks (like how rearranging some neurons doesn’t change the output) simplify these calculations. The curvature can be described using far fewer values than expected, making computations much faster. This research helps us better understand how neural networks learn and could lead to faster and more efficient training methods in the future.

Chat is not available.