Timezone: »

Poster
Differentially Private Clustering in High-Dimensional Euclidean Spaces
Nina Balcan · Travis Dick · Yingyu Liang · Wenlong Mou · Hongyang Zhang

Wed Aug 09 01:30 AM -- 05:00 AM (PDT) @ Gallery #86
We study the problem of clustering sensitive data while preserving the privacy of individuals represented in the dataset, which has broad applications in practical machine learning and data analysis tasks. Although the problem has been widely studied in the context of low-dimensional, discrete spaces, much remains unknown concerning private clustering in high-dimensional Euclidean spaces $\R^d$. In this work, we give differentially private and efficient algorithms achieving strong guarantees for $k$-means and $k$-median clustering when $d=\Omega(\polylog(n))$. Our algorithm achieves clustering loss at most $\log^3(n)\OPT+\poly(\log n,d,k)$, advancing the state-of-the-art result of $\sqrt{d}\OPT+\poly(\log n,d^d,k^d)$. We also study the case where the data points are $s$-sparse and show that the clustering loss can scale logarithmically with $d$, i.e., $\log^3(n)\OPT+\poly(\log n,\log d,k,s)$. Experiments on both synthetic and real datasets verify the effectiveness of the proposed method.

#### Author Information

##### Nina Balcan (Carnegie Mellon University)

Maria-Florina Balcan is an Associate Professor in the School of Computer Science at Carnegie Mellon University. Her main research interests are machine learning and theoretical computer science. Her honors include the CMU SCS Distinguished Dissertation Award, an NSF CAREER Award, a Microsoft Faculty Research Fellowship, a Sloan Research Fellowship, and several paper awards. She has served as a Program Committee Co-chair for COLT 2014, a Program Committee Co-chair for ICML 2016, and a board member of the International Machine Learning Society.