Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Next Generation of AI Safety

Privacy Auditing of Large Language Models

Ashwinee Panda · Xinyu Tang · Milad Nasr · Christopher A. Choquette Choo · Prateek Mittal

Keywords: [ privacy auditing ] [ differential privacy ]


Abstract: An important research question is better understanding the privacy leakage of LLMs. The most practical and common way we have to understand privacy leakage is through a privacy audit. The first step in a successful privacy audit is a good membership inference attack.A major challenge in privacy auditing language models (LLMs) is the development of effective membership inference attacks. Current methods rely on basic approaches to generate canaries, which may not be optimal for measuring privacy leakage and underestimate the privacy leakage. In this work, we introduce a novel method to generate more effective canaries for membership inference attacks on LLMs. We demonstrate through experiments on fine-tuned LLMs that our approach can significantly improve the detection of privacy leakage compared to existing methods. For non-privately trained LLMs, our attack achieves $64.2%$ TPR at $0.01%$ FPR, largely surpassing previous attack that achieves $36.8%$ TPR at $0.01%$ FPR. Our method can be used to provide a privacy audit of $\varepsilon \approx 1$ for a model trained with theoretical $\varepsilon$ of 4. To the best of our knowledge, this is the first time that a privacy audit of LLM training has achieved nontrivial auditing success in the setting where the attacker cannot train shadow models, insert gradient canaries, or access the model at every iteration.

Chat is not available.