Tutorial
DP-fy your DATA: How to (and why) synthesize Differentially Private Synthetic Data
Natalia Ponomareva · Sergei Vassilvitskii · Peter Kairouz · Alex Bie
This tutorial focuses on the increasingly important area of differentially private (DP) synthetic data generation, addressing the need for robust anonymization in machine learning. Creating DP synthetic data allows for data sharing and analysis without compromising individuals' privacy, opening up possibilities for collaborative research and model training. The tutorial aims to bridge the gap between various related fields, such as DP training, DP inference, and empirical privacy testing, providing a comprehensive guide for generating DP synthetic data across different data types.
The tutorial will cover various aspects of DP synthetic data generation, starting with an introduction to different types of synthetic data and their benefits. It will then provide a brief overview of differential privacy, focusing on the key concepts needed to understand the subsequent sections. The core of the tutorial will delve into specific methods for generating DP synthetic data for tabular, image, and text data, with a significant emphasis on text data generation. The tutorial will elaborate on main components of a DP synthetic data generation system including what privacy guarantees to aim for, and what contribution constraints to apply on the data. It will also review best practices for handling sensitive data, and empirical privacy testing. Finally, the tutorial will conclude with a discussion of open questions and challenges in the field.
Live content is unavailable. Log in and register to view live content