Timezone: »
There has been some recent interest in detecting and addressing memorization of training data by deep neural networks. A formal framework for memorization in generative models, called ``data-copying'' was proposed by Meehan et. al (2020). We build upon their work to show that their framework may fail to detect certain kinds of blatant memorization. Motivated by this and the theory of non-parametric methods, we provide an alternative definition of data-copying that applies more locally. We provide a method to detect data-copying, and provably show that it works with high probability when enough data is available. We also provide lower bounds that characterize the sample requirement for reliable detection.
Author Information
Robi Bhattacharjee (UCSD)
Sanjoy Dasgupta (UC San Diego)
Kamalika Chaudhuri (UCSD, Meta AI Research, and FAIR)
More from the Same Authors
-
2021 : Understanding Instance-based Interpretability of Variational Auto-Encoders »
· Zhifeng Kong · Kamalika Chaudhuri -
2021 : Privacy Amplification by Bernoulli Sampling »
Jacob Imola · Kamalika Chaudhuri -
2021 : A Shuffling Framework For Local Differential Privacy »
Casey M Meehan · Amrita Roy Chowdhury · Kamalika Chaudhuri · Somesh Jha -
2021 : Privacy Amplification by Subsampling in Time Domain »
Tatsuki Koga · Casey M Meehan · Kamalika Chaudhuri -
2022 : Robust Empirical Risk Minimization with Tolerance »
Robi Bhattacharjee · Max Hopkins · Akash Kumar · Hantao Yu · Kamalika Chaudhuri -
2022 : Understanding Rare Spurious Correlations in Neural Networks »
Yao-Yuan Yang · Chi-Ning Chou · Kamalika Chaudhuri -
2023 : Machine Learning with Feature Differential Privacy »
Saeed Mahloujifar · Chuan Guo · G. Edward Suh · Kamalika Chaudhuri -
2023 : Panel Discussion »
Peter Kairouz · Song Han · Kamalika Chaudhuri · Florian Tramer -
2023 : Kamalika Chaudhuri »
Kamalika Chaudhuri -
2023 Poster: Privacy-Aware Compression for Federated Learning Through Numerical Mechanism Design »
Chuan Guo · Kamalika Chaudhuri · Pierre Stock · Michael Rabbat -
2023 Oral: Why does Throwing Away Data Improve Worst-Group Error? »
Kamalika Chaudhuri · Kartik Ahuja · Martin Arjovsky · David Lopez-Paz -
2023 Poster: A Two-Stage Active Learning Algorithm for k-Nearest Neighbors »
Nicholas Rittler · Kamalika Chaudhuri -
2023 Poster: Why does Throwing Away Data Improve Worst-Group Error? »
Kamalika Chaudhuri · Kartik Ahuja · Martin Arjovsky · David Lopez-Paz -
2022 Poster: Thompson Sampling for Robust Transfer in Multi-Task Bandits »
Zhi Wang · Chicheng Zhang · Kamalika Chaudhuri -
2022 Poster: Constants Matter: The Performance Gains of Active Learning »
Stephen Mussmann · Sanjoy Dasgupta -
2022 Spotlight: Constants Matter: The Performance Gains of Active Learning »
Stephen Mussmann · Sanjoy Dasgupta -
2022 Spotlight: Thompson Sampling for Robust Transfer in Multi-Task Bandits »
Zhi Wang · Chicheng Zhang · Kamalika Chaudhuri -
2022 Poster: Bounding Training Data Reconstruction in Private (Deep) Learning »
Chuan Guo · Brian Karrer · Kamalika Chaudhuri · Laurens van der Maaten -
2022 Poster: Framework for Evaluating Faithfulness of Local Explanations »
Sanjoy Dasgupta · Nave Frost · Michal Moshkovitz -
2022 Spotlight: Framework for Evaluating Faithfulness of Local Explanations »
Sanjoy Dasgupta · Nave Frost · Michal Moshkovitz -
2022 Oral: Bounding Training Data Reconstruction in Private (Deep) Learning »
Chuan Guo · Brian Karrer · Kamalika Chaudhuri · Laurens van der Maaten -
2021 : Discussion Panel #2 »
Bo Li · Nicholas Carlini · Andrzej Banburski · Kamalika Chaudhuri · Will Xiao · Cihang Xie -
2021 : Invited Talk #9 »
Kamalika Chaudhuri -
2021 : Invited Talk: Kamalika Chaudhuri »
Kamalika Chaudhuri -
2021 : Invited Talk: Kamalika Chaudhuri »
Kamalika Chaudhuri -
2021 : Live Panel Discussion »
Thomas Dietterich · Chelsea Finn · Kamalika Chaudhuri · Yarin Gal · Uri Shalit -
2021 Poster: Sample Complexity of Robust Linear Classification on Separated Data »
Robi Bhattacharjee · Somesh Jha · Kamalika Chaudhuri -
2021 Spotlight: Sample Complexity of Robust Linear Classification on Separated Data »
Robi Bhattacharjee · Somesh Jha · Kamalika Chaudhuri -
2021 Poster: Connecting Interpretability and Robustness in Decision Trees through Separation »
Michal Moshkovitz · Yao-Yuan Yang · Kamalika Chaudhuri -
2021 Spotlight: Connecting Interpretability and Robustness in Decision Trees through Separation »
Michal Moshkovitz · Yao-Yuan Yang · Kamalika Chaudhuri -
2020 Poster: When are Non-Parametric Methods Robust? »
Robi Bhattacharjee · Kamalika Chaudhuri -
2020 Poster: Explainable k-Means and k-Medians Clustering »
Michal Moshkovitz · Sanjoy Dasgupta · Cyrus Rashtchian · Nave Frost -
2019 Poster: Teaching a black-box learner »
Sanjoy Dasgupta · Daniel Hsu · Stefanos Poulis · Jerry Zhu -
2019 Oral: Teaching a black-box learner »
Sanjoy Dasgupta · Daniel Hsu · Stefanos Poulis · Jerry Zhu -
2019 Talk: Opening Remarks »
Kamalika Chaudhuri · Ruslan Salakhutdinov -
2018 Poster: Active Learning with Logged Data »
Songbai Yan · Kamalika Chaudhuri · Tara Javidi -
2018 Poster: Analyzing the Robustness of Nearest Neighbors to Adversarial Examples »
Yizhen Wang · Somesh Jha · Kamalika Chaudhuri -
2018 Oral: Active Learning with Logged Data »
Songbai Yan · Kamalika Chaudhuri · Tara Javidi -
2018 Oral: Analyzing the Robustness of Nearest Neighbors to Adversarial Examples »
Yizhen Wang · Somesh Jha · Kamalika Chaudhuri -
2018 Tutorial: Understanding your Neighbors: Practical Perspectives From Modern Analysis »
Sanjoy Dasgupta · Samory Kpotufe -
2017 Workshop: Picky Learners: Choosing Alternative Ways to Process Data. »
Corinna Cortes · Kamalika Chaudhuri · Giulia DeSalvo · Ningshan Zhang · Chicheng Zhang -
2017 Poster: Active Heteroscedastic Regression »
Kamalika Chaudhuri · Prateek Jain · Nagarajan Natarajan -
2017 Poster: Diameter-Based Active Learning »
Christopher Tosh · Sanjoy Dasgupta -
2017 Talk: Diameter-Based Active Learning »
Christopher Tosh · Sanjoy Dasgupta -
2017 Talk: Active Heteroscedastic Regression »
Kamalika Chaudhuri · Prateek Jain · Nagarajan Natarajan