Timezone: »
We present a new mechanism for label differential privacy, a relaxation of differentially private machine learning that only protects the privacy of the labels in the training set. Our mechanism clusters the examples in the training set using their (non-private) feature vectors, randomly re-samples each label from examples in the same cluster, and outputs a training set with noisy labels as well as a modified version of the true loss function. We prove that when the clusters are both large and high-quality, the model that minimizes the modified loss on the noisy training set converges to small excess risk at a rate that is comparable to the rate for non-private learning. Our experiments show that randomizing the labels within each cluster significantly improves the privacy vs. accuracy trade-off compared to applying uniform randomized response to the labels, and also compared to learning a model via DP-SGD.
Author Information
Hossein Esfandiari (Google Research)
Vahab Mirrokni (Google Research)
Umar Syed (Google)
Sergei Vassilvitskii (Google)
More from the Same Authors
-
2022 Poster: Tight and Robust Private Mean Estimation with Few Users »
Shyam Narayanan · Vahab Mirrokni · Hossein Esfandiari -
2022 Oral: Tight and Robust Private Mean Estimation with Few Users »
Shyam Narayanan · Vahab Mirrokni · Hossein Esfandiari -
2019 Poster: Categorical Feature Compression via Submodular Optimization »
Mohammad Hossein Bateni · Lin Chen · Hossein Esfandiari · Thomas Fu · Vahab Mirrokni · Afshin Rostamizadeh -
2019 Oral: Categorical Feature Compression via Submodular Optimization »
Mohammad Hossein Bateni · Lin Chen · Hossein Esfandiari · Thomas Fu · Vahab Mirrokni · Afshin Rostamizadeh