Processing math: 100%
Skip to yearly menu bar Skip to main content


Poster

Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization

Pan Zhou · Xiao-Tong Yuan

Keywords: [ Convex Optimization ] [ Large Scale Learning and Big Data ] [ Optimization - Convex ]


Abstract: Stochastic variance-reduced gradient (SVRG) algorithms have been shown to work favorably in solving large-scale learning problems. Despite the remarkable success, the stochastic gradient complexity of SVRG-type algorithms usually scales linearly with data size and thus could still be expensive for huge data. To address this deficiency, we propose a hybrid stochastic-deterministic minibatch proximal gradient (HSDMPG) algorithm for strongly-convex problems that enjoys provably improved data-size-independent complexity guarantees. More precisely, for quadratic loss F(\wm) of n components, we prove that HSDMPG can attain an ϵ-optimization-error E[F(θ)F(θ)]ϵ within O(κ1.5ϵ0.25log1.5(1ϵ)(κnlog2(1ϵ)+κ3n1.5ϵ)) stochastic gradient evaluations, where κ is condition number. For generic strongly convex loss functions, we prove a nearly identical complexity bound though at the cost of slightly increased logarithmic factors. For large-scale learning problems, our complexity bounds are superior to those of the prior state-of-the-art SVRG algorithms with or without dependence on data size. Particularly, in the case of ϵ=O(1/n) which is at the order of intrinsic excess error bound of a learning model and thus sufficient for generalization, the stochastic gradient complexity bounds of HSDMPG~for quadratic and generic loss functions are respectively O(n0.875log1.5(n)) and O(n0.875log2.25(n)), which to our best knowledge, for the first time achieve optimal generalization in less than a single pass over data. Extensive numerical results demonstrate the computational advantages of our algorithm over the prior ones.

Chat is not available.