Skip to yearly menu bar Skip to main content

Workshop: Theory and Practice of Differential Privacy

Label differential privacy via clustering

Hossein Esfandiari · Vahab Mirrokni · Umar Syed · Sergei Vassilvitskii


We present a new mechanism for label differential privacy, a relaxation of differentially private machine learning that only protects the privacy of the labels in the training set. Our mechanism clusters the examples in the training set using their (non-private) feature vectors, randomly re-samples each label from examples in the same cluster, and outputs a training set with noisy labels as well as a modified version of the true loss function. We prove that when the clusters are both large and high-quality, the model that minimizes the modified loss on the noisy training set converges to small excess risk at a rate that is comparable to the rate for non-private learning. Our experiments show that randomizing the labels within each cluster significantly improves the privacy vs. accuracy trade-off compared to applying uniform randomized response to the labels, and also compared to learning a model via DP-SGD.

Chat is not available.