Oral
Distributed Nonparametric Regression under Communication Constraints
Yuancheng Zhu · John Lafferty
This paper studies the problem of nonparametric estimation of a smoothfunction with data distributed across multiple machines. We assume anindependent sample from a white noise model is collected at eachmachine, and an estimator of the underlying true function needs to beconstructed at a central machine. We place limits on the number ofbits that each machine can use to transmit information to the centralmachine. Our results give both asymptotic lower bounds and matchingupper bounds on the statistical risk under various settings. Weidentify three regimes, depending on the relationship among the numberof machines, the size of data available at each machine, and thecommunication budget. When the communication budget is small, thestatistical risk depends solely on this communication bottleneck,regardless of the sample size. In the regime where the communicationbudget is large, the classic minimax risk in the non-distributedestimation setting is recovered. In an intermediate regime, thestatistical risk depends on both the sample size and the communicationbudget.