Given the importance of clustering in the analysisof large scale data, distributed algorithms for formulations such as k-means, k-median, etc. have been extensively studied. A successful approach here has been the “reduce and merge” paradigm, in which each machine reduces its input size to Õ(k), and this data reduction continues (possibly iteratively) until all the data fits on one machine, at which point the problem is solved locally. This approach has the intrinsic bottleneck that each machine must solve a problem of size ≥ k, and needs to communicate at least Ω(k) points to the other machines. We propose a novel data partitioning idea to overcome this bottleneck, and in effect, have different machines focus on “finding different clusters”. Under the assumption that we know the optimum value of the objective up to a poly(n) factor (arbitrary polynomial), we establish worst-case approximation guarantees for our method. We see that our algorithm results in lower communication as well as a near-optimal number of ‘rounds’ of computation (in the popular MapReduce framework).
Aditya Bhaskara (University of Utah)
Maheshakya Wijewardena (University of Utah)
I am a fourth-year Ph.D. student at the University of Utah, School of Computing. I received my bachelor's degree in Computer Science and Engineering from University of Moratuwa, Sri Lanka. I'm particularly interested in designing communication and space efficient distributed algorithms with provable guarantees for unsupervised learning tasks such as clustering, PCA, and low-rank matrix approximation. Towards this end, I study algorithms in Massively Parallel Computation (MPC) model such as Map-Reduce as well as in Federated Learning environments and related concerns such as privacy. Recently, I have started studying the robustness of matrix completion techniques and their applications. I am broadly interested in approximation algorithms, communication complexity, and theoretical machine learning topics. In addition to these topics, I have worked on a project that involves designing randomized approaches for protecting location privacy against radio-window attacks which is closely related to game theory in security regime.
Related Events (a corresponding poster, oral, or spotlight)
2018 Oral: Distributed Clustering via LSH Based Data Partitioning »
Fri Jul 13th 09:20 -- 09:30 AM Room K11
More from the Same Authors
2020 Poster: Online Learning with Imperfect Hints »
Aditya Bhaskara · Ashok Cutkosky · Ravi Kumar · Manish Purohit