Large-scale Uncertainty Quantification for Latent Variable Models Using Subsampling Markov Chain Monte Carlo
Abstract
Stochastic gradient Langevin dynamics combined with Gibbs updates (SGLD-Gibbs) provides a highly scalable approach to approximate Bayesian inference in latent variable models. However, it remains unclear how to tune the algorithm's hyperparameters in a principled manner to ensure the uncertainty estimates are statistically meaningful. In this work, we address this gap in tuning guidance by developing a statistical scaling limit theory for SGLD-Gibbs. We derive a joint asymptotic limit for the global parameters and latent variables under appropriate space-time rescaling. We show that global parameters converge to a diffusion-type limit, while individual latent variables converge to a jump process reflecting their intermittent Gibbs updates. This joint jump-diffusion structure reveals how latent-variable randomness contributes to the stationary distribution of the global parameters. We leverage our results to provide explicit guidance on hyperparameter tuning for SGLD-Gibbs that ensures meaningful uncertainty quantification. Our empirical results show that SGLD-Gibbs with our tuning guidance leads to better parameter estimates and uncertainty quantification than stochastic variational inference.