SVRG and Beyond via Posterior Correction
Abstract
Stochastic Variance Reduced Gradient (SVRG) and its variants aim to speed-up training by using gradient corrections. In their decade of existence, these methods have never been connected to any Bayesian methods, at least not at a fundamental level. Here, we fill this gap and show surprising new connections of SVRG to a recently proposed Bayesian method called ‘posterior correction’. Our main contribution is to show that SVRG can be recovered as a special case of posterior correction when applied over isotropic-Gaussian posteriors. Novel extensions of SVRG are automatically obtained by using more flexible exponential-family posteriors. We derive two new such extensions by using Gaussian families: a Newton-like variant with novel Hessian corrections, and an Adam-like extension that scales to large problems. Our work is the first to connect SVRG to Bayes and use it to boost training.