Oral
Distributed Learning with Sublinear Communication
Jayadev Acharya · Christopher De Sa · Dylan Foster · Karthik Sridharan
Abstract:
In distributed statistical learning, $N$ samples are split across $m$ machines and a learner wishes to use minimal communication to learn as well as if the examples were on a single machine.
This model has received substantial interest in machine learning due to its scalability and potential for parallel speedup. However, in the high-dimensional regime, where the number examples is smaller than the number of features (``dimension''), the speedup afforded by distributed learning may be overshadowed by the cost of communicating a single example. This paper investigates
the following question: When is it possible to learn a $d$-dimensional model in the distributed setting with total communication sublinear in $d$?
Starting with a negative result, we show that for learning the usual variants
of (sparse or norm-bounded) linear models, no algorithm can obtain optimal error
until communication is linear in dimension. Our main result is to show
that by slightly relaxing the standard statistical assumptions for
this setting we can obtain distributed algorithms that enjoy optimal
error and communication logarithmic in dimension. Our upper
bounds are based on family of algorithms that combine mirror descent
with randomized sparsification/quantization of iterates, and extend to
the general stochastic convex optimization model.
Chat is not available.
Successful Page Load