Session
Approximate Inference 1
Semi-implicit variational inference (SIVI) is introduced to expand the commonly used analytic variational distribution family, by mixing the variational parameter with a flexible distribution. This mixing distribution can assume any density function, explicit or not, as long as independent random samples can be generated via reparameterization. Not only does SIVI expand the variational family to incorporate highly flexible variational distributions, including implicit ones that have no analytic density functions, but also sandwiches the evidence lower bound (ELBO) between a lower bound and an upper bound, and further derives an asymptotically exact surrogate ELBO that is amenable to optimization via stochastic gradient ascent. With a substantially expanded variational family and a novel optimization algorithm, SIVI is shown to closely match the accuracy of MCMC in inferring the posterior in a variety of Bayesian inference tasks.
Efficient Gradient-Free Variational Inference using Policy Search
Oleg Arenz · Gerhard Neumann · Mingjun Zhong
Inference from complex distributions is a common problem in machine learning needed for many Bayesian methods. We propose an efficient, gradient-free method for learning general GMM approximations of multimodal distributions based on recent insights from stochastic search methods. Our method establishes information-geometric trust regions to ensure efficient exploration of the sampling space and stability of the GMM updates, allowing for efficient estimation of multi-variate Gaussian variational distributions. For GMMs, we apply a variational lower bound to decompose the learning objective into sub-problems given by learning the individual mixture components and the coefficients. The number of mixture components is adapted online in order to allow for arbitrary exact approximations. We demonstrate on several domains that we can learn significantly better approximations than competing variational inference methods and that the quality of samples drawn from our approximations is on par with samples created by state-of-the-art MCMC samplers that require significantly more computational resources.
A Spectral Approach to Gradient Estimation for Implicit Distributions
Jiaxin Shi · Shengyang Sun · Jun Zhu
Recently there have been increasing interests in learning and inference with implicit distributions (i.e., distributions without tractable densities). To this end, we develop a gradient estimator for implicit distributions based on Stein's identity and a spectral decomposition of kernel operators, where the eigenfunctions are approximated by the Nystr{\"o}m method. Unlike the previous works that only provide estimates at the sample points, our approach directly estimates the gradient function, thus allows for a simple and principled out-of-sample extension. We provide theoretical results on the error bound of the estimator and discuss the bias-variance tradeoff in practice. The effectiveness of our method is demonstrated by applications to gradient-free Hamiltonian Monte Carlo and variational inference with implicit distributions. Finally, we discuss the intuition behind the estimator by drawing connections between the Nystr{\"o}m method and kernel PCA, which indicates that the estimator can automatically adapt to the geometry of the underlying distribution.
Quasi-Monte Carlo Variational Inference
Alexander Buchholz · Florian Wenzel · Stephan Mandt
Many machine learning problems involve MonteCarlo gradient estimators. As a prominent example, we focus on Monte Carlo variational inference (MCVI) in this paper. The performanceof MCVI crucially depends on the variance of itsstochastic gradients. We propose variance reduction by means of Quasi-Monte Carlo (QMC) sampling. QMC replaces N i.i.d. samples from a uniform probability distribution by a deterministicsequence of samples of length N. This sequencecovers the underlying random variable space moreevenly than i.i.d. draws, reducing the variance ofthe gradient estimator. With our novel approach,both the score function and the reparameterization gradient estimators lead to much faster convergence. We also propose a new algorithm forMonte Carlo objectives, where we operate witha constant learning rate and increase the numberof QMC samples per iteration. We prove that thisway, our algorithm can converge asymptoticallyat a faster rate than SGD . We furthermore providetheoretical guarantees on qmc for Monte Carloobjectives that go beyond MCVI , and support ourfindings by several experiments on large-scaledata sets from various domains.