Timezone: »
In the privacy-utility tradeoff of a model trained on benchmark language and vision tasks, remarkable improvements have been widely reported when the model is pretrained on public data. Some gain is expected as these models inherit the benefits of transfer learning, which is the standard motivation in non-private settings. However, the stark contrast in the gain of pretraining between non-private and private machine learning suggests that the gain in the latter is rooted in a fundamentally different cause. To explain this phenomenon, we hypothesize that the non-convex loss landscape of a model training necessitates the optimization algorithm to go through two phases. In the first, the algorithm needs to select a good ``basin'' in the loss landscape. In the second, the algorithm solves an easy optimization within that basin. The former is a harder problem to solve with private data, while the latter is harder to solve with public data due to a distribution shift or data scarcity. Guided by this intuition, we provide theoretical constructions that provably demonstrate the separation between private training with and without public pretraining. Further, systematic experiments on CIFAR10 and Librispeech provide supporting evidence for our hypothesis.
Author Information
Arun Ganesh (Google)
Mahdi Haghifam (University of Toronto/Vector Institute)
Milad Nasresfahani (Google)
Sewoong Oh (University of Washington)
Thomas Steinke (Google)
Om Thakkar (Google)
Abhradeep Guha Thakurta (Google Deepmind)
Lun Wang (Google)
More from the Same Authors
-
2021 : Towards a Unified Information-Theoretic Framework for Generalization »
Mahdi Haghifam · Gintare Karolina Dziugaite · Shay Moran -
2021 : The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation »
Peter Kairouz · Ziyu Liu · Thomas Steinke -
2021 : Robust and Differentially Private Covariance Estimation »
Logan Gnanapragasam · Jonathan Hayase · Sewoong Oh -
2021 : Practical and Private (Deep) Learning without Sampling orShuffling »
Peter Kairouz · Hugh B McMahan · Shuang Song · Om Dipakbhai Thakkar · Abhradeep Guha Thakurta · Zheng Xu -
2021 : Differentially Private Model Personalization »
Prateek Jain · J K Rush · Adam Smith · Shuang Song · Abhradeep Guha Thakurta -
2021 : The Flajolet-Martin Sketch Itself Preserves Differential Privacy: Private Counting with Minimal Space »
Adam Smith · Shuang Song · Abhradeep Guha Thakurta -
2021 : Privately Learning Subspaces »
Vikrant Singhal · Thomas Steinke -
2021 : Private Alternating Least Squares: Practical Private Matrix Completion with Tighter Rates »
Steve Chien · Prateek Jain · Walid Krichene · Steffen Rendle · Shuang Song · Abhradeep Guha Thakurta · Li Zhang -
2023 : Algorithms for Optimal Adaptation of Diffusion Models to Reward Functions »
Krishnamurthy Dvijotham · Shayegan Omidshafiei · Kimin Lee · Katie Collins · Deepak Ramachandran · Adrian Weller · Mohammad Ghavamzadeh · Milad Nasresfahani · Ying Fan · Jeremiah Liu -
2023 : Unleashing the Power of Randomization in Auditing Differentially Private ML »
Krishna Pillutla · Galen Andrew · Peter Kairouz · Hugh B McMahan · Alina Oprea · Sewoong Oh -
2023 : Privacy Auditing with One (1) Training Run »
Thomas Steinke · Milad Nasresfahani · Matthew Jagielski -
2023 : Can Public Large Language Models Help Private Cross-device Federated Learning? »
Boxin Wang · Yibo J. Zhang · Yuan Cao · Bo Li · Hugh B McMahan · Sewoong Oh · Zheng Xu · Manzil Zaheer -
2023 : Can Public Large Language Models Help Private Cross-device Federated Learning? »
Boxin Wang · Yibo J. Zhang · Yuan Cao · Bo Li · Hugh B McMahan · Sewoong Oh · Zheng Xu · Manzil Zaheer -
2023 Poster: Secure Federated Correlation Test and Entropy Estimation »
Qi Pang · Lun Wang · Shuai Wang · Wenting Zheng · Dawn Song -
2023 Oral: Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning »
Christopher Choquette-Choo · Hugh B McMahan · J K Rush · Abhradeep Guha Thakurta -
2023 Poster: Effectively Using Public Data in Privacy Preserving Machine Learning »
Milad Nasresfahani · Saeed Mahloujifar · Xinyu Tang · Prateek Mittal · Amir Houmansadr -
2023 Poster: Multi-Task Differential Privacy Under Distribution Skew »
Walid Krichene · Prateek Jain · Shuang Song · Mukund Sundararajan · Abhradeep Guha Thakurta · Li Zhang -
2023 Poster: CRISP: Curriculum based Sequential neural decoders for Polar code family »
S Ashwin Hebbar · Viraj Nadkarni · Ashok Vardhan Makkuva · Suma Bhat · Sewoong Oh · Pramod Viswanath -
2023 Poster: Private Federated Learning with Autotuned Compression »
Enayat Ullah · Christopher Choquette-Choo · Peter Kairouz · Sewoong Oh -
2023 Poster: Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning »
Christopher Choquette-Choo · Hugh B McMahan · J K Rush · Abhradeep Guha Thakurta -
2022 Poster: Public Data-Assisted Mirror Descent for Private Model Training »
Ehsan Amid · Arun Ganesh · Rajiv Mathews · Swaroop Ramaswamy · Shuang Song · Thomas Steinke · Thomas Steinke · Vinith Suriyakumar · Om Thakkar · Abhradeep Guha Thakurta -
2022 Spotlight: Public Data-Assisted Mirror Descent for Private Model Training »
Ehsan Amid · Arun Ganesh · Rajiv Mathews · Swaroop Ramaswamy · Shuang Song · Thomas Steinke · Thomas Steinke · Vinith Suriyakumar · Om Thakkar · Abhradeep Guha Thakurta -
2021 : Invited Talk: Thomas Steinke »
Thomas Steinke -
2021 Poster: Practical and Private (Deep) Learning Without Sampling or Shuffling »
Peter Kairouz · Brendan McMahan · Shuang Song · Om Dipakbhai Thakkar · Abhradeep Guha Thakurta · Zheng Xu -
2021 Spotlight: Practical and Private (Deep) Learning Without Sampling or Shuffling »
Peter Kairouz · Brendan McMahan · Shuang Song · Om Dipakbhai Thakkar · Abhradeep Guha Thakurta · Zheng Xu -
2021 Poster: Private Alternating Least Squares: Practical Private Matrix Completion with Tighter Rates »
Steve Chien · Prateek Jain · Walid Krichene · Steffen Rendle · Shuang Song · Abhradeep Guha Thakurta · Li Zhang -
2021 Oral: Private Alternating Least Squares: Practical Private Matrix Completion with Tighter Rates »
Steve Chien · Prateek Jain · Walid Krichene · Steffen Rendle · Shuang Song · Abhradeep Guha Thakurta · Li Zhang -
2020 Poster: New Oracle-Efficient Algorithms for Private Synthetic Data Release »
Giuseppe Vietri · Grace Tian · Mark Bun · Thomas Steinke · Steven Wu -
2019 Poster: Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms »
Ashok Vardhan Makkuva · Pramod Viswanath · Sreeram Kannan · Sewoong Oh -
2019 Oral: Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms »
Ashok Vardhan Makkuva · Pramod Viswanath · Sreeram Kannan · Sewoong Oh -
2019 Poster: Rate Distortion For Model Compression:From Theory To Practice »
Weihao Gao · Yu-Han Liu · Chong Wang · Sewoong Oh -
2019 Oral: Rate Distortion For Model Compression:From Theory To Practice »
Weihao Gao · Yu-Han Liu · Chong Wang · Sewoong Oh