Timezone: »
In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. (Here, public data refers to auxiliary data sets that have no privacy concerns.) We design a natural variant of DP mirror descent, where the DP gradients of the private/sensitive data act as the linear term, and the loss generated by the public data as the mirror map.We show that, for linear regression with feature vectors drawn from a non-isotropic sub-Gaussian distribution, our algorithm, PDA-DPMD (a variant of mirror descent), provides population risk guarantees that are asymptotically better than the best known guarantees under DP (without having access to public data), when the number of public data samples is sufficiently large. We further show that our algorithm has natural ``noise stability'' properties that control the variance due to noise added to ensure DP.We demonstrate the efficacy of our algorithm by showing privacy/utility trade-offs on four benchmark datasets (StackOverflow, WikiText-2, CIFAR-10, and EMNIST). We show that our algorithm not only significantly improves over traditional DP-SGD, which does not have access to public data, but to our knowledge is the first to improve over DP-SGD on models that have been pre-trained with public data.
Author Information
Ehsan Amid (Google Brain)
Arun Ganesh (UC Berkeley)
Rajiv Mathews (Google)
Swaroop Ramaswamy (Google)
Shuang Song (Google)
Thomas Steinke (Google)
Thomas Steinke (Google)
Vinith Suriyakumar (Massachusetts Institute of Technology)
Om Thakkar (Google)
Abhradeep Guha Thakurta (Google)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Public Data-Assisted Mirror Descent for Private Model Training »
Tue. Jul 19th 03:45 -- 03:50 PM Room Ballroom 1 & 2
More from the Same Authors
-
2021 : The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation »
Peter Kairouz · Ziyu Liu · Thomas Steinke -
2021 : Practical and Private (Deep) Learning without Sampling orShuffling »
Peter Kairouz · Hugh B McMahan · Shuang Song · Om Dipakbhai Thakkar · Abhradeep Guha Thakurta · Zheng Xu -
2021 : Differentially Private Model Personalization »
Prateek Jain · J K Rush · Adam Smith · Shuang Song · Abhradeep Guha Thakurta -
2021 : The Flajolet-Martin Sketch Itself Preserves Differential Privacy: Private Counting with Minimal Space »
Adam Smith · Shuang Song · Abhradeep Guha Thakurta -
2021 : Privately Learning Subspaces »
Vikrant Singhal · Thomas Steinke -
2021 : Private Alternating Least Squares: Practical Private Matrix Completion with Tighter Rates »
Steve Chien · Prateek Jain · Walid Krichene · Steffen Rendle · Shuang Song · Abhradeep Guha Thakurta · Li Zhang -
2023 : To Aggregate or Not? Learning with Separate Noisy Labels »
Jiaheng Wei · Zhaowei Zhu · Tianyi Luo · Ehsan Amid · Abhishek Kumar · Yang Liu -
2023 : Privacy Auditing with One (1) Training Run »
Thomas Steinke · Milad Nasresfahani · Matthew Jagielski -
2023 Poster: When Personalization Harms Performance: Reconsidering the Use of Group Attributes in Prediction »
Vinith Suriyakumar · Marzyeh Ghassemi · Berk Ustun -
2023 Poster: Why Is Public Pretraining Necessary for Private Model Training? »
Arun Ganesh · Mahdi Haghifam · Milad Nasresfahani · Sewoong Oh · Thomas Steinke · Om Thakkar · Abhradeep Guha Thakurta · Lun Wang -
2023 Oral: Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning »
Christopher Choquette-Choo · Hugh B McMahan · J K Rush · Abhradeep Guha Thakurta -
2023 Oral: When Personalization Harms Performance: Reconsidering the Use of Group Attributes in Prediction »
Vinith Suriyakumar · Marzyeh Ghassemi · Berk Ustun -
2023 Poster: Multi-Task Differential Privacy Under Distribution Skew »
Walid Krichene · Prateek Jain · Shuang Song · Mukund Sundararajan · Abhradeep Guha Thakurta · Li Zhang -
2023 Poster: Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning »
Christopher Choquette-Choo · Hugh B McMahan · J K Rush · Abhradeep Guha Thakurta -
2021 : Invited Talk: Thomas Steinke »
Thomas Steinke -
2021 Poster: Practical and Private (Deep) Learning Without Sampling or Shuffling »
Peter Kairouz · Brendan McMahan · Shuang Song · Om Dipakbhai Thakkar · Abhradeep Guha Thakurta · Zheng Xu -
2021 Poster: The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation »
Peter Kairouz · Ziyu Liu · Thomas Steinke -
2021 Spotlight: The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation »
Peter Kairouz · Ziyu Liu · Thomas Steinke -
2021 Spotlight: Practical and Private (Deep) Learning Without Sampling or Shuffling »
Peter Kairouz · Brendan McMahan · Shuang Song · Om Dipakbhai Thakkar · Abhradeep Guha Thakurta · Zheng Xu -
2021 Poster: Private Alternating Least Squares: Practical Private Matrix Completion with Tighter Rates »
Steve Chien · Prateek Jain · Walid Krichene · Steffen Rendle · Shuang Song · Abhradeep Guha Thakurta · Li Zhang -
2021 Oral: Private Alternating Least Squares: Practical Private Matrix Completion with Tighter Rates »
Steve Chien · Prateek Jain · Walid Krichene · Steffen Rendle · Shuang Song · Abhradeep Guha Thakurta · Li Zhang -
2020 Poster: New Oracle-Efficient Algorithms for Private Synthetic Data Release »
Giuseppe Vietri · Grace Tian · Mark Bun · Thomas Steinke · Steven Wu