Talk
in
Workshop: Beyond first order methods in machine learning systems
Talk by Rio Yokota - Degree of Approximation and Overhead of Computing Curvature, Information, and Noise Matrices
Rio Yokota
Abstract:
Hessian, Fisher, and Covariance matrices are not only used for preconditioning optimizers, but also in generalization metrics, predicting hyperparameters, and Bayesian inference. These matrices contain valuable information that can advance theory in statistical learning, but they are very expensive to compute exactly for modern deep neural networks with billions of parameters. We make use of a highly optimized implementation for computing these matrices with various degrees of approximation to close the gap between theory and practice. We are able to significantly reduce the overhead of computing these matrices through a hybrid data-parallel + model-parallel approach.
Chat is not available.