Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML)

Differentially Private Clustering in Data Streams

Alessandro Epasto · Tamalika Mukherjee · Peilin Zhong


Abstract: Clustering problems (such as k-means and k-median) are fundamental unsupervised machine learning primitives. Recently, these problems have been subject to large interest in the privacy literature. All prior work on private clustering, however, has been devoted to the \emph{offline} case where the entire dataset is known in advance.In this work, we focus on the more challenging private data stream setting where the aim is to design memory-efficient algorithms that process a large stream \emph{incrementally} as points arrive in a private way. Our main contribution is to provide the first differentially private algorithms for $k$-means and $k$-median clustering in data streams. In particular, our algorithms are the first to guarantee differential privacy both in the continual release and in the one-shot setting while achieving space sublinear in the stream size. We complement our theoretical results with an empirical analysis of our algorithms on real data.

Chat is not available.