Poster
Variational Bayesian dropout: pitfalls and fixes
Jiri Hron · Alexander Matthews · Zoubin Ghahramani
Hall B #194
Dropout, a~stochastic regularisation technique for training of neural networks, has recently been reinterpreted as a~specific type of approximate inference algorithm for Bayesian neural networks. The~main contribution of the~reinterpretation is in providing a~theoretical framework useful for analysing and extending the~algorithm. We show that the~proposed framework suffers from several issues; from undefined or pathological behaviour of the~true posterior related to use of improper priors, to an ill-defined variational objective due to singularity of the~approximating distribution relative to the~true posterior. Our analysis of the~improper log uniform prior used in variational Gaussian dropout suggests the~pathologies are generally irredeemable, and that the~algorithm still works only because the~variational formulation annuls some of the~pathologies. To address the~singularity issue, we proffer Quasi-KL (QKL) divergence, a~new approximate inference objective for approximation of high-dimensional distributions. We show that motivations for variational Bernoulli dropout based on discretisation and noise have QKL as a limit. Properties of QKL are studied both theoretically and on a~simple practical example which shows that the~QKL-optimal approximation of a~full rank Gaussian with a~degenerate one naturally leads to the~Principal Component Analysis solution.
Live content is unavailable. Log in and register to view live content