Timezone: »

Label differential privacy via clustering
Hossein Esfandiari · Vahab Mirrokni · Umar Syed · Sergei Vassilvitskii

We present a new mechanism for label differential privacy, a relaxation of differentially private machine learning that only protects the privacy of the labels in the training set. Our mechanism clusters the examples in the training set using their (non-private) feature vectors, randomly re-samples each label from examples in the same cluster, and outputs a training set with noisy labels as well as a modified version of the true loss function. We prove that when the clusters are both large and high-quality, the model that minimizes the modified loss on the noisy training set converges to small excess risk at a rate that is comparable to the rate for non-private learning. Our experiments show that randomizing the labels within each cluster significantly improves the privacy vs. accuracy trade-off compared to applying uniform randomized response to the labels, and also compared to learning a model via DP-SGD.

Author Information

Hossein Esfandiari (Google Research)
Vahab Mirrokni (Google Research)
Umar Syed (Google)
Sergei Vassilvitskii (Google)

More from the Same Authors