Scalable Kronecker-Factored Fisher Approximation for Neural Network Parameter Sensitivity
Abstract
The Fisher Information Matrix (FIM) provides a principled geometric framework for parameter sensitivity in neural networks, but directly computing and using the full FIM is infeasible in high-dimensional models. As a result, most existing methods rely on diagonal approximations that discard important correlation structure. We introduce Matrix-free Fisher Factorization (MFF), a GPU-tractable algorithm that captures both diagonal and off-diagonal dependencies without materializing the full matrix. For post-training neural network layer compression, we prove that under Matrix-Variate Normal assumptions, MFF yields GFWSVD, a unique closed-form linear layer decomposition that optimally minimizes the expected second-order loss increase. Experiments on controlled numerical benchmarks with large neural networks show that GFWSVD achieves up to 50\% compression while matching or exceeding state-of-the-art diagonal and activation-based baselines across most tasks, and it reliably avoids collapse in dense architectures such as Llama 3. Moreover, when used to initialize existing optimization pipelines (e.g., Dobi-SVD), GFWSVD better preserves accuracy at 40\% parameter reduction in regimes where standard methods substantially degrade. Together, these results position MFF and GFWSVD as foundational algorithmic primitives for scalable, second-order-aware neural network approximation and parameter sensitivity.