Skip to yearly menu bar Skip to main content


Quantitative Reasoning About Data Privacy in Machine Learning

Chuan Guo · Reza Shokri

Moderator : Laurens van der Maaten

Ballroom 1 & 2


Machine learning algorithms leak a significant amount of information about their training data. A legitimate user of a model can reconstruct sensitive information about the training data, by having access to its predictions or parameters. Given that all privacy policies and regulations require privacy auditing of (machine learning) algorithms, we are interested in a generic approach to perform quantitative reasoning about the privacy risks of various machine learning algorithms. Differentially private machine learning is currently the most widely accepted framework for privacy-preserving machine learning on sensitive data. The framework prescribes a rigorous accounting of information leakage about the training data through the learning algorithm using statistical divergences. However, it is often difficult to interpret this mathematical guarantee in terms of how a randomized algorithm limits how much an adversary can infer about one's data. For example, if a model is trained on my private emails containing personal information such as credit card number, does DP epsilon = 10 prevent my credit card number from being leaked by the model? If I am a patient participating in a personalized cancer treatment prediction study, does DP epsilon = 5 prevent others from identifying my membership (and hence my cancer positivity) in this study? In this tutorial, we present a unified view of recent works that translate privacy bounds to practical inference attacks and provide a rigorous quantitative understanding of DP machine learning. The objective is to link the underlying relation between privacy concepts, inference attacks, protection mechanisms, and tools, and to make the whole field more understandable to ML researchers and engineers.

Chat is not available.