Skip to yearly menu bar Skip to main content


Semi-Cyclic Stochastic Gradient Descent

Hubert Eichner · Tomer Koren · Brendan McMahan · Nati Srebro · Kunal Talwar

Pacific Ballroom #148

Keywords: [ Parallel and Distributed Learning ] [ Optimization - Others ] [ Large Scale Learning and Big Data ]


We consider convex SGD updates with a block-cyclic structure, i.e., where each cycle consists of a small number of blocks, each with many samples from a possibly different, block-specific, distribution. This situation arises, e.g., in Federated Learning where the mobile devices available for updates at different times during the day have different characteristics. We show that such block-cyclic structure can significantly deteriorate the performance of SGD, but propose a simple approach that allows prediction with the same guarantees as for i.i.d., non-cyclic, sampling.

Live content is unavailable. Log in and register to view live content