Poster Mon, Jul 6, 2026 • 6:30 PM – 8:15 PM PDT HALL A #3200

Differentially Private Synthetic Data via APIs 4: Tabular Data

Toan Tran ⋅ Arturs Backurs ⋅ Zinan Lin ⋅ Victor Reis ⋅ Li Xiong ⋅ Sergey Yekhanin

Abstract

This paper investigates the problem of generating synthetic tabular data with differential privacy (DP) guarantees, enabling data sharing in sensitive domains. Despite extensive study, state-of-the-art methods often focus on minimizing low-order marginal query errors and overlook the challenges posed by high-order correlations. To address this gap, we extend the Private Evolution (PE) framework, originally developed for DP-compliant image and text synthesis, to tabular data. We introduce Tab-PE -- an algorithm for synthetic tabular data generation under DP constraints. Tab-PE iteratively improves a candidate dataset via an evolutionary process that leverages tabular-specialized operators to produce variations, privately scores them, and selects the highest-quality samples to retain and propagate. In contrast to the original PE, which relies on large foundation models, Tab-PE employs heuristic operators with significantly lower computational costs, making PE more practical and scalable for tabular data. Through extensive experiments on real-world and simulation datasets, we demonstrate that Tab-PE substantially outperforms prior baselines on datasets exhibiting high-order correlations. Compared to the best baseline -- AIM, Tab-PE improves classification accuracy by up to 10\% while running 28$\times$ faster.

Lay Summary

Tabular datasets, such as medical records, financial transactions, and survey data, are widely used in research and machine learning but often contain sensitive personal information. Synthetic data offers a way to share and analyze such data while reducing privacy risks by creating artificial records that resemble the original dataset. This paper studies how to generate synthetic tabular data with formal privacy guarantees. Existing methods often work well when data patterns involve only a few columns, but they can struggle when important patterns depend on complex interactions across many columns. To address this challenge, we propose Tab-PE, a new method inspired by evolutionary search. Tab-PE starts with random synthetic records, creates small variations of them, privately evaluates which records best match the real data, and repeatedly improves the synthetic dataset. Compared with the previous leading methods, which often require many costly calculations over the data on GPUs, Tab-PE runs entirely on CPUs and can generate synthetic data in seconds in our experiments. This makes it efficient and practical for tabular data. Experiments show that Tab-PE better preserves complex data patterns than existing methods, especially under strong privacy constraints, while also running much faster than competitive baselines.