Poster
Massively Parallel -Means Clustering for Perturbation Resilient Instances
Vincent Cohen-Addad · Vahab Mirrokni · Peilin Zhong
Hall E #1106
Keywords: [ OPT: Large Scale, Parallel and Distributed ] [ MISC: Unsupervised and Semi-supervised Learning ] [ MISC: Scalable Algorithms ] [ Theory ]
Abstract:
We consider -means clustering of data points in Euclidean space in the Massively Parallel Computation (MPC) model, a computational model which is an abstraction of modern massively parallel computing system such as MapReduce. Recent work provides evidence that getting -approximate -means solution for general input points using rounds in the MPC model may be impossible under certain conditions [Ghaffari, Kuhn \& Uitto'2019]. However, the real-world data points usually have better structures. One instance of interest is the set of data points which is perturbation resilient [Bilu \& Linial'2010]. In particular, a point set is -perturbation resilient for -means if perturbing pairwise distances by multiplicative factors in the range does not change the optimum -means clusters. We bypass the worst case lower bound by considering the perturbation resilient input points and showing rounds -means clustering algorithms for these instances in the MPC model. Specifically, we show a fully scalable -approximate -means clustering algorithm for -perturbation resilient instance in the MPC model using rounds and total space. If the space per machine is sufficiently larger than , i.e., at least , we also develop an optimal -means clustering algorithm for -perturbation resilient instance in MPC using rounds and total space.
Chat is not available.