Thu Jul 12th 06:15 -- 09:00 PM @ Hall B #77
Distilling the Posterior in Bayesian Neural Networks
Kuan-Chieh Wang · Paul Vicol · James Lucas · Li Gu · Roger Grosse · Richard Zemel

In many applications of deep learning, it is crucial to capture model and prediction uncertainty. Unlike classic neural networks (NN), Bayesian neural networks (BNN) allow us to reason about uncertainty in a more principled way. Stochastic Gradient Langevin Dynamics (SGLD) enables learning a BNN with only simple modifications to the standard optimization framework (SGD). Instead of obtaining a single point-estimate of the model, the result of SGLD is samples from the BNN posterior. However, SGLD and its extensions require storage of the entire history of model parameters, a potentially prohibitive cost (especially for large neural networks). We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using Generative Adversarial Networks (GAN). At test-time, samples are generated by the GAN. We show that this distillation framework incurs no loss in performance on recent BNN applications including anomaly detection, active learning, and defense against attacks. By construction, our framework not only distills the Bayesian predictive distribution, but the posterior itself. This allows users to compute quantity such as the approximate model variance, which is useful in the downstream tasks.