Timezone: »

Privacy for Free: How does Dataset Condensation Help Privacy?
Tian Dong · Bo Zhao · Lingjuan Lyu

Tue Jul 19 03:30 PM -- 05:30 PM (PDT) @ Hall E #913

To prevent unintentional data leakage, research community has resorted to data generators that can produce differentially private data for model training. However, for the sake of the data privacy, existing solutions suffer from either expensive training cost or poor generalization performance. Therefore, we raise the question whether training efficiency and privacy can be achieved simultaneously. In this work, we for the first time identify that dataset condensation (DC) which is originally designed for improving training efficiency can be a better solution to replace data generators for private data generation, thus providing privacy for free. To demonstrate the privacy benefit of DC, we build a connection between DC and differential privacy (DP), and theoretically prove on linear feature extractors (and then extended to non-linear feature extractors) that the existence of one sample has limited impact (O(m/n)) on the parameter distribution of networks trained on m samples synthesized from n (n >> m) raw data by DC. We also empirically validate the vision privacy and membership privacy of DC-synthesized data by launching both the loss-based and the state-of-the-art likelihood-based membership inference attacks. We envision this work as a milestone for data-efficient and privacy-preserving machine learning.

Author Information

Tian Dong (Shanghai Jiao Tong University)
Bo Zhao (The University of Edinburgh)
Lingjuan Lyu (Sony AI)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors