Skip to yearly menu bar Skip to main content


Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy

Kareem Amin · Alex Kulesza · andres munoz · Sergei Vassilvitskii

Pacific Ballroom #123

Keywords: [ Supervised Learning ] [ Privacy-preserving Statistics and Machine Learning ] [ Computational Learning Theory ]


Differentially private learning algorithms protect individual participants in the training dataset by guaranteeing that their presence does not significantly change the resulting model. In order to make this promise, such algorithms need to know the maximum contribution that can be made by a single user: the more data an individual can contribute, the more noise will need to be added to protect them. While most existing analyses assume that the maximum contribution is known and fixed in advance—indeed, it is often assumed that each user contributes only a single example—we argue that in practice there is a meaningful choice to be made. On the one hand, if we allow users to contribute large amounts of data, we may end up adding excessive noise to protect a few outliers, even when the majority contribute only modestly. On the other hand, limiting users to small contributions keeps noise levels low at the cost of potentially discarding significant amounts of excess data, thus introducing bias. Here, we characterize this trade-off for an empirical risk minimization setting, showing that in general there is a “sweet spot” that depends on measurable properties of the dataset, but that there is also a concrete cost to privacy that cannot be avoided simply by collecting more data.

Live content is unavailable. Log in and register to view live content