Timezone: »
Modern machine learning models are complex and frequently encode surprising amounts of information about individual inputs. In extreme cases, complex models appear to memorize entire input examples, including seemingly irrelevant information (social security numbers from text, for example). In this paper, we aim to understand whether this sort of memorization is necessary for accurate learning. We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples. This remains true even when the examples are high-dimensional and have entropy much higher than the sample size, and even when most of that information is ultimately irrelevant to the task at hand. Further, our results do not depend on the training algorithm or the class of models used for learning.
Our problems are simple and fairly natural variants of the next-symbol prediction and the cluster labeling tasks. These tasks can be seen as abstractions of text- and image-related prediction problems. To establish our results, we reduce from a family of one-way communication problems for which we prove new information complexity lower bounds.
Author Information
Gavin Brown (Boston University)
Mark Bun (Boston University)
Vitaly Feldman (Google Brain)
Adam Smith (Boston University)
Kunal Talwar (Apple)
More from the Same Authors
-
2021 : Lossless Compression of Efficient Private Local Randomizers »
Vitaly Feldman · Kunal Talwar -
2021 : Differential Secrecy for Distributed Data and Applications to Robust Differentially Secure Vector Summation »
Kunal Talwar -
2021 : Nonparametric Differentially Private Confidence Intervals for the Median »
Joerg Drechsler · Ira Globus-Harris · Audra McMillan · Adam Smith · Jayshree Sarathy -
2021 : Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling »
Vitaly Feldman · Audra McMillan · Kunal Talwar -
2021 : Mean Estimation with User-level Privacy under Data Heterogeneity »
Rachel Cummings · Vitaly Feldman · Audra McMillan · Kunal Talwar -
2021 : Multiclass versus Binary Differentially Private PAC Learning »
Satchit Sivakumar · Mark Bun · Marco Gaboradi -
2021 : Differentially Private Model Personalization »
Prateek Jain · J K Rush · Adam Smith · Shuang Song · Abhradeep Guha Thakurta -
2021 : The Flajolet-Martin Sketch Itself Preserves Differential Privacy: Private Counting with Minimal Space »
Adam Smith · Shuang Song · Abhradeep Guha Thakurta -
2021 : Differentially Private Sampling from Distributions »
Satchit Sivakumar · Marika Swanberg · Sofya Raskhodnikova · Adam Smith -
2021 : A Practitioners Guide to Differentially Private Convex Optimization »
Ryan McKenna · Hristo Paskov · Kunal Talwar -
2021 : Covariance-Aware Private Mean Estimation Without Private Covariance Estimation »
Gavin Brown · Marco Gaboradi · Adam Smith · Jonathan Ullman · Lydia Zakynthinou -
2023 : Differentially Private Heavy Hitters using Federated Analytics »
Karan Chadha · Junye Chen · John Duchi · Vitaly Feldman · Hanieh Hashemi · Omid Javidbakht · Audra McMillan · Kunal Talwar -
2023 Poster: The Price of Differential Privacy under Continual Observation »
Palak Jain · Sofya Raskhodnikova · Satchit Sivakumar · Adam Smith -
2023 Oral: The Price of Differential Privacy under Continual Observation »
Palak Jain · Sofya Raskhodnikova · Satchit Sivakumar · Adam Smith -
2023 Poster: Near-Optimal Algorithms for Private Online Optimization in the Realizable Regime »
Hilal Asi · Vitaly Feldman · Tomer Koren · Kunal Talwar -
2022 : Low-Communication Algorithms for Private Federated Data Analysis »
Kunal Talwar -
2022 Poster: Optimal Algorithms for Mean Estimation under Local Differential Privacy »
Hilal Asi · Vitaly Feldman · Kunal Talwar -
2022 Oral: Optimal Algorithms for Mean Estimation under Local Differential Privacy »
Hilal Asi · Vitaly Feldman · Kunal Talwar -
2022 Poster: Private frequency estimation via projective geometry »
Vitaly Feldman · Jelani Nelson · Huy Nguyen · Kunal Talwar -
2022 Spotlight: Private frequency estimation via projective geometry »
Vitaly Feldman · Jelani Nelson · Huy Nguyen · Kunal Talwar -
2021 Poster: Private Adaptive Gradient Methods for Convex Optimization »
Hilal Asi · John Duchi · Alireza Fallah · Omid Javidbakht · Kunal Talwar -
2021 Poster: Lossless Compression of Efficient Private Local Randomizers »
Vitaly Feldman · Kunal Talwar -
2021 Poster: Private Stochastic Convex Optimization: Optimal Rates in L1 Geometry »
Hilal Asi · Vitaly Feldman · Tomer Koren · Kunal Talwar -
2021 Spotlight: Private Adaptive Gradient Methods for Convex Optimization »
Hilal Asi · John Duchi · Alireza Fallah · Omid Javidbakht · Kunal Talwar -
2021 Oral: Private Stochastic Convex Optimization: Optimal Rates in L1 Geometry »
Hilal Asi · Vitaly Feldman · Tomer Koren · Kunal Talwar -
2021 Spotlight: Lossless Compression of Efficient Private Local Randomizers »
Vitaly Feldman · Kunal Talwar -
2021 Poster: Differentially Private Correlation Clustering »
Mark Bun · Marek Elias · Janardhan Kulkarni -
2021 Spotlight: Differentially Private Correlation Clustering »
Mark Bun · Marek Elias · Janardhan Kulkarni -
2021 Poster: Characterizing Structural Regularities of Labeled Data in Overparameterized Models »
Ziheng Jiang · Chiyuan Zhang · Kunal Talwar · Michael Mozer -
2021 Oral: Characterizing Structural Regularities of Labeled Data in Overparameterized Models »
Ziheng Jiang · Chiyuan Zhang · Kunal Talwar · Michael Mozer -
2019 Poster: The advantages of multiple classes for reducing overfitting from test set reuse »
Vitaly Feldman · Roy Frostig · Moritz Hardt -
2019 Oral: The advantages of multiple classes for reducing overfitting from test set reuse »
Vitaly Feldman · Roy Frostig · Moritz Hardt