Skip to yearly menu bar Skip to main content


Poster

Privacy-Preserving Instructions for Aligning Large Language Models

Da Yu · Peter Kairouz · Sewoong Oh · Zheng Xu


Abstract:

Service providers of large language model (LLM) applications collect user inputs to improve their models. These inputs often contain sensitive information, raising significant privacy concerns. However, existing approaches for protecting the privacy of user instructions fall short in certain areas, such as overlooking the potential risks introduced by the involvement of human annotators. To address these limitations, we propose to use differentially private synthetic instructions as a substitute for real instructions. Typically, generating differentially private synthetic data involves privately training a generative model and then sampling from it. However, we find that there is a non-trivial distributional gap between real instructions and synthetic instructions sampled from private generative models. To bridge this gap, we introduce a differentially private filtering algorithm to refine the initial synthetic instructions. Our extensive experiments demonstrate the high utility of filtered synthetic instructions in both supervised fine-tuning and reinforcement learning from human feedback. For example, in supervised fine-tuning, models trained with filtered synthetic instructions are comparable to if not better than leading open-source models, such as LLaMA2-Chat and Vicuna.

Live content is unavailable. Log in and register to view live content