Timezone: »
Differentially private learning algorithms protect individual participants in the training dataset by guaranteeing that their presence does not significantly change the resulting model. In order to make this promise, such algorithms need to know the maximum contribution that can be made by a single user: the more data an individual can contribute, the more noise will need to be added to protect them. While most existing analyses assume that the maximum contribution is known and fixed in advance—indeed, it is often assumed that each user contributes only a single example—we argue that in practice there is a meaningful choice to be made. On the one hand, if we allow users to contribute large amounts of data, we may end up adding excessive noise to protect a few outliers, even when the majority contribute only modestly. On the other hand, limiting users to small contributions keeps noise levels low at the cost of potentially discarding significant amounts of excess data, thus introducing bias. Here, we characterize this trade-off for an empirical risk minimization setting, showing that in general there is a “sweet spot” that depends on measurable properties of the dataset, but that there is also a concrete cost to privacy that cannot be avoided simply by collecting more data.
Author Information
Kareem Amin (Google Research)
Alex Kulesza (Google)
andres munoz (Google)
Sergei Vassilvitskii (Google)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy »
Thu. Jun 13th 04:25 -- 04:30 PM Room Seaside Ballroom
More from the Same Authors
-
2021 : Learning with User-Level Privacy »
Daniel A Levy · Ziteng Sun · Kareem Amin · Satyen Kale · Alex Kulesza · Mehryar Mohri · Ananda Theertha Suresh -
2023 : Learning-augmented private algorithms for multiple quantile release »
Mikhail Khodak · Kareem Amin · Travis Dick · Sergei Vassilvitskii -
2023 Poster: Learning-augmented private algorithms for multiple quantile release »
Mikhail Khodak · Kareem Amin · Travis Dick · Sergei Vassilvitskii -
2023 Poster: Label differential privacy and private training data release »
Robert Busa-Fekete · andres munoz · Umar Syed · Sergei Vassilvitskii -
2023 Tutorial: How to DP-fy ML: A Practical Tutorial to Machine Learning with Differential Privacy »
Sergei Vassilvitskii · Natalia Ponomareva · Zheng Xu -
2022 Poster: A Joint Exponential Mechanism For Differentially Private Top-$k$ »
Jennifer Gillenwater · Matthew Joseph · andres munoz · Monica Ribero Diaz -
2022 Spotlight: A Joint Exponential Mechanism For Differentially Private Top-$k$ »
Jennifer Gillenwater · Matthew Joseph · andres munoz · Monica Ribero Diaz -
2021 Poster: Differentially Private Quantiles »
Jennifer Gillenwater · Matthew Joseph · Alex Kulesza -
2021 Spotlight: Differentially Private Quantiles »
Jennifer Gillenwater · Matthew Joseph · Alex Kulesza -
2020 : Duff: A Dataset-Based Utility Function Family for the Exponential Mechanism »
andres munoz -
2019 Workshop: Negative Dependence: Theory and Applications in Machine Learning »
Mike Gartrell · Jennifer Gillenwater · Alex Kulesza · Zelda Mariet -
2019 Poster: A Tree-Based Method for Fast Repeated Sampling of Determinantal Point Processes »
Jennifer Gillenwater · Alex Kulesza · Zelda Mariet · Sergei Vassilvitskii -
2019 Oral: A Tree-Based Method for Fast Repeated Sampling of Determinantal Point Processes »
Jennifer Gillenwater · Alex Kulesza · Zelda Mariet · Sergei Vassilvitskii -
2018 Poster: Competitive Caching with Machine Learned Advice »
Thodoris Lykouris · Sergei Vassilvitskii -
2018 Oral: Competitive Caching with Machine Learned Advice »
Thodoris Lykouris · Sergei Vassilvitskii -
2017 Poster: Consistent k-Clustering »
Silvio Lattanzi · Sergei Vassilvitskii -
2017 Talk: Consistent k-Clustering »
Silvio Lattanzi · Sergei Vassilvitskii