Skip to yearly menu bar Skip to main content


Poster

Demystifying Doubly Stochastic Gradient Descent

Kyurae Kim · Joohwan Ko · Yian Ma · Jacob Gardner


Abstract: Optimization objectives in the form of a sum of intractable expectations are rising in importance (*e.g.*, diffusion models, variational autoencoders, and many more). A setting that is also known as "finite sum with infinite data." For these problems, a popular strategy is to employ *doubly stochastic gradient descent* (doubly SGD): the expectations are estimated using (potentially correlated) Monte Carlo estimators, while the sum is estimated using subsampling. Despite its popularity, little is known about the convergence properties of doubly SGD, except under strong assumptions such as the "uniformly" smooth or globally bounded variance assumptions. In this work, we establish the convergence of doubly SGD with independent subsampling and random reshuffling under realizable assumptions, which reveals a non-trivial interplay between subsampling and Monte Carlo sampling. As a result, under a per-iteration computational budget of $b \times m$, where $b$ is the minibatch size and $m$ is the number of Monte Carlo samples, our theoretical analysis has a clear suggestion of where one should invest most of the budget. Furthermore, we prove that random reshuffling improves the complexity dependence on the subsampling noise.

Live content is unavailable. Log in and register to view live content