Poster
Demystifying Doubly Stochastic Gradient Descent
Kyurae Kim · Joohwan Ko · Yian Ma · Jacob Gardner
[
Abstract
]
Abstract:
Optimization objectives in the form of a sum of intractable expectations are rising in importance (*e.g.*, diffusion models, variational autoencoders, and many more). A setting that is also known as "finite sum with infinite data." For these problems, a popular strategy is to employ *doubly stochastic gradient descent* (doubly SGD): the expectations are estimated using (potentially correlated) Monte Carlo estimators, while the sum is estimated using subsampling. Despite its popularity, little is known about the convergence properties of doubly SGD, except under strong assumptions such as the "uniformly" smooth or globally bounded variance assumptions. In this work, we establish the convergence of doubly SGD with independent subsampling and random reshuffling under realizable assumptions, which reveals a non-trivial interplay between subsampling and Monte Carlo sampling. As a result, under a per-iteration computational budget of $b \times m$, where $b$ is the minibatch size and $m$ is the number of Monte Carlo samples, our theoretical analysis has a clear suggestion of where one should invest most of the budget. Furthermore, we prove that random reshuffling improves the complexity dependence on the subsampling noise.
Live content is unavailable. Log in and register to view live content