Session
Deep Learning (Bayesian) 2
Variational Bayesian dropout: pitfalls and fixes
Jiri Hron · Alexander Matthews · Zoubin Ghahramani
Dropout, a~stochastic regularisation technique for training of neural networks, has recently been reinterpreted as a~specific type of approximate inference algorithm for Bayesian neural networks. The~main contribution of the~reinterpretation is in providing a~theoretical framework useful for analysing and extending the~algorithm. We show that the~proposed framework suffers from several issues; from undefined or pathological behaviour of the~true posterior related to use of improper priors, to an ill-defined variational objective due to singularity of the~approximating distribution relative to the~true posterior. Our analysis of the~improper log uniform prior used in variational Gaussian dropout suggests the~pathologies are generally irredeemable, and that the~algorithm still works only because the~variational formulation annuls some of the~pathologies. To address the~singularity issue, we proffer Quasi-KL (QKL) divergence, a~new approximate inference objective for approximation of high-dimensional distributions. We show that motivations for variational Bernoulli dropout based on discretisation and noise have QKL as a limit. Properties of QKL are studied both theoretically and on a~simple practical example which shows that the~QKL-optimal approximation of a~full rank Gaussian with a~degenerate one naturally leads to the~Principal Component Analysis solution.
Accurate Uncertainties for Deep Learning Using Calibrated Regression
Volodymyr Kuleshov · Nathan Fenner · Stefano Ermon
Accounting for uncertainty in modern deep learning algorithms is crucial for building reliable, interpretable, and interactive systems. Existing approaches typically center on Bayesian methods, which may not always accurately capture real-world uncertainty, e.g. a 95% confidence interval may not contain the true outcome 95% of the time. Here, we propose a simple procedure that is guaranteed to calibrate probabilistic forecasts obtained from Bayesian deep learning models as well as general regression algorithms. Our procedure is inspired by Platt scaling for support vector machines and extends existing recalibration methods for classification to regression tasks. We evaluate our method on Bayesian linear regression as well as feedforward and recurrent Bayesian neural networks trained with approximate variational inference. We find that our method produces calibrated uncertainty estimates and improves performance on tasks in time series forecasting and reinforcement learning.
Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning
Stefan Depeweg · Jose Miguel Hernandez-Lobato · Finale Doshi-Velez · Steffen Udluft
Bayesian neural networks with latent variables arescalable and flexible probabilistic models: theyaccount for uncertainty in the estimation of thenetwork weights and, by making use of latent variables,can capture complex noise patterns in thedata. Using these models we show how to performand utilize a decomposition of uncertainty inaleatoric and epistemic components for decisionmaking purposes. This allows us to successfullyidentify informative points for active learning offunctions with heteroscedastic and bimodal noise.Using the decomposition we further define a novelrisk-sensitive criterion for reinforcement learningto identify policies that balance expected cost,model-bias and noise aversion.
Scalable approximate Bayesian inference for particle tracking data
Ruoxi Sun · Department of Statistics Liam Paninski
Many important datasets in physics, chemistry,and biology consist of noisy sequences of imagesof multiple moving overlapping particles.In many cases, the observed particles are indistinguishable,leading to unavoidable uncertaintyabout nearby particles’ identities. Exact Bayesianinference is intractable in this setting, and previousapproximate Bayesian methods scale poorly.Non-Bayesian approaches that output a single“best” estimate of the particle tracks (thus discardingimportant uncertainty information) aretherefore dominant in practice. Here we proposea flexible and scalable amortized approach forBayesian inference on this task. We introducea novel neural network method to approximatethe (intractable) filter-backward-sample-forwardalgorithm for Bayesian inference in this setting.By varying the simulated training data for the network,we can perform inference on a wide varietyof data types. This approach is therefore highlyflexible and improves on the state of the art interms of accuracy; provides uncertainty estimatesabout the particle locations and identities; and hasa test run-time that scales linearly as a functionof the data length and number of particles, thusenabling Bayesian inference in arbitrarily largeparticle tracking datasets.
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Mohammad Emtiyaz Khan · Didrik Nielsen · Voot Tangkaratt · Wu Lin · Yarin Gal · Akash Srivastava
Uncertainty computation in deep learning is essential to design robust and reliable systems. Variational inference (VI) is a promising approach for such computation, but requires more effort to implement and execute compared to maximum-likelihood methods. In this paper, we propose new natural-gradient algorithms to reduce such efforts for Gaussian mean-field VI. Our algorithms can be implemented within the Adam optimizer by perturbing the network weights during gradient evaluations, and uncertainty estimates can be cheaply obtained by using the vector that adapts the learning rate. This requires lower memory, computation, and implementation effort than existing VI methods, while obtaining uncertainty estimates of comparable quality. Our empirical results confirm this and further suggest that the weight-perturbation in our algorithm could be useful for exploration in reinforcement learning and stochastic optimization.