ICML Minimal Random Code Learning with Mean-KL Parameterization

Poster
in
Workshop: Neural Compression: From Information Theory to Applications

Minimal Random Code Learning with Mean-KL Parameterization

Jihao Andreas Lin · Gergely Flamich · Jose Miguel Hernandez-Lobato

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: This paper studies the qualitative behavior and robustness of two variants of Minimal Random Code Learning (MIRACLE) used to compress variational Bayesian neural networks. MIRACLE implements a powerful, conditionally Gaussian variational approximation for the weight posterior

$Q_{\mathbf{w}}$ and uses relative entropy coding to compress a weight sample from the posterior using a Gaussian coding distribution

$P_{\mathbf{w}}$ . To achieve the desired compression rate,

$D_{\mathrm{KL}}[Q_{\mathbf{w}} \Vert P_{\mathbf{w}}]$ must be constrained, which requires a computationally expensive annealing procedure under the conventional mean-variance (Mean-Var) parameterization for

$Q_{\mathbf{w}}$ . Instead, we parameterize

$Q_{\mathbf{w}}$ by its mean and KL divergence from

$P_{\mathbf{w}}$ to constrain the compression cost to the desired value by construction. We demonstrate that variational training with Mean-KL parameterization converges twice as fast and maintains predictive performance after compression. Furthermore, we show that Mean-KL leads to more meaningful variational distributions with heavier tails and compressed weight samples which are more robust to pruning.

Chat is not available.

Poster in Workshop: Neural Compression: From Information Theory to Applications

Minimal Random Code Learning with Mean-KL Parameterization

Jihao Andreas Lin · Gergely Flamich · Jose Miguel Hernandez-Lobato

Poster
in
Workshop: Neural Compression: From Information Theory to Applications