Poster
in
Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)
Fine-Tuning Large Language Models with User-Level Differential Privacy
Zachary Charles · Arun Ganesh · Ryan McKenna · Hugh B McMahan · Nicole Mitchell · Krishna Pillutla · J K Rush
We investigate practical and scalable algorithms for training large language models (LLMs) with user-level differential privacy (ULDP). We study variants of DP-SGD that use example-level sampling (ELS) and user-level sampling (ULS). We derive a novel ULDP accountant that computes provably tight privacy guarantees for ELS, and use it to show that while ELS outperforms ULS in specific settings, ULS performs better when users have diverse collections of examples. We validate our findings in realistic LLM fine-tuning tasks under fixed compute budgets. Our results show that ULS is significantly better when (1) strong privacy guarantees are required, or (2) the compute budget is large. Our focus on LLM-compatible training algorithms allows us to scale to models with hundreds of millions of parameters and datasets with hundreds of thousands of users.