Skip to yearly menu bar Skip to main content


Session

Deep Learning (Bayesian) 2

Abstract:
Chat is not available.

Thu 12 July 7:00 - 7:20 PDT

Variational Bayesian dropout: pitfalls and fixes

Jiri Hron · Alexander Matthews · Zoubin Ghahramani

Dropout, a~stochastic regularisation technique for training of neural networks, has recently been reinterpreted as a~specific type of approximate inference algorithm for Bayesian neural networks. The~main contribution of the~reinterpretation is in providing a~theoretical framework useful for analysing and extending the~algorithm. We show that the~proposed framework suffers from several issues; from undefined or pathological behaviour of the~true posterior related to use of improper priors, to an ill-defined variational objective due to singularity of the~approximating distribution relative to the~true posterior. Our analysis of the~improper log uniform prior used in variational Gaussian dropout suggests the~pathologies are generally irredeemable, and that the~algorithm still works only because the~variational formulation annuls some of the~pathologies. To address the~singularity issue, we proffer Quasi-KL (QKL) divergence, a~new approximate inference objective for approximation of high-dimensional distributions. We show that motivations for variational Bernoulli dropout based on discretisation and noise have QKL as a limit. Properties of QKL are studied both theoretically and on a~simple practical example which shows that the~QKL-optimal approximation of a~full rank Gaussian with a~degenerate one naturally leads to the~Principal Component Analysis solution.

Thu 12 July 7:20 - 7:30 PDT

Accurate Uncertainties for Deep Learning Using Calibrated Regression

Volodymyr Kuleshov · Nathan Fenner · Stefano Ermon

Accounting for uncertainty in modern deep learning algorithms is crucial for building reliable, interpretable, and interactive systems. Existing approaches typically center on Bayesian methods, which may not always accurately capture real-world uncertainty, e.g. a 95% confidence interval may not contain the true outcome 95% of the time. Here, we propose a simple procedure that is guaranteed to calibrate probabilistic forecasts obtained from Bayesian deep learning models as well as general regression algorithms. Our procedure is inspired by Platt scaling for support vector machines and extends existing recalibration methods for classification to regression tasks. We evaluate our method on Bayesian linear regression as well as feedforward and recurrent Bayesian neural networks trained with approximate variational inference. We find that our method produces calibrated uncertainty estimates and improves performance on tasks in time series forecasting and reinforcement learning.

Thu 12 July 7:30 - 7:40 PDT

Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning

Stefan Depeweg · Jose Miguel Hernandez-Lobato · Finale Doshi-Velez · Steffen Udluft

Bayesian neural networks with latent variables arescalable and flexible probabilistic models: theyaccount for uncertainty in the estimation of thenetwork weights and, by making use of latent variables,can capture complex noise patterns in thedata. Using these models we show how to performand utilize a decomposition of uncertainty inaleatoric and epistemic components for decisionmaking purposes. This allows us to successfullyidentify informative points for active learning offunctions with heteroscedastic and bimodal noise.Using the decomposition we further define a novelrisk-sensitive criterion for reinforcement learningto identify policies that balance expected cost,model-bias and noise aversion.

Thu 12 July 7:40 - 7:50 PDT

Scalable approximate Bayesian inference for particle tracking data

Ruoxi Sun · Department of Statistics Liam Paninski

Many important datasets in physics, chemistry,and biology consist of noisy sequences of imagesof multiple moving overlapping particles.In many cases, the observed particles are indistinguishable,leading to unavoidable uncertaintyabout nearby particles’ identities. Exact Bayesianinference is intractable in this setting, and previousapproximate Bayesian methods scale poorly.Non-Bayesian approaches that output a single“best” estimate of the particle tracks (thus discardingimportant uncertainty information) aretherefore dominant in practice. Here we proposea flexible and scalable amortized approach forBayesian inference on this task. We introducea novel neural network method to approximatethe (intractable) filter-backward-sample-forwardalgorithm for Bayesian inference in this setting.By varying the simulated training data for the network,we can perform inference on a wide varietyof data types. This approach is therefore highlyflexible and improves on the state of the art interms of accuracy; provides uncertainty estimatesabout the particle locations and identities; and hasa test run-time that scales linearly as a functionof the data length and number of particles, thusenabling Bayesian inference in arbitrarily largeparticle tracking datasets.

Thu 12 July 7:50 - 8:00 PDT

Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam

Mohammad Emtiyaz Khan · Didrik Nielsen · Voot Tangkaratt · Wu Lin · Yarin Gal · Akash Srivastava

Uncertainty computation in deep learning is essential to design robust and reliable systems. Variational inference (VI) is a promising approach for such computation, but requires more effort to implement and execute compared to maximum-likelihood methods. In this paper, we propose new natural-gradient algorithms to reduce such efforts for Gaussian mean-field VI. Our algorithms can be implemented within the Adam optimizer by perturbing the network weights during gradient evaluations, and uncertainty estimates can be cheaply obtained by using the vector that adapts the learning rate. This requires lower memory, computation, and implementation effort than existing VI methods, while obtaining uncertainty estimates of comparable quality. Our empirical results confirm this and further suggest that the weight-perturbation in our algorithm could be useful for exploration in reinforcement learning and stochastic optimization.