Skip to yearly menu bar Skip to main content


Delay-Adaptive Step-sizes for Asynchronous Learning

Xuyang Wu · Sindri Magnusson · Hamid Reza Feyzmahdavian · Mikael Johansson

Hall E #703

Keywords: [ T: Optimization ] [ OPT: Convex ] [ OPT: First-order ] [ OPT: Optimization and Learning under Uncertainty ] [ OPT: Learning for Optimization ] [ OPT: Large Scale, Parallel and Distributed ]


In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art.

Chat is not available.