Timezone: »
Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data.Understanding this memorization is important in real world applications and also from a learning-theoretical perspective. An open question in previous studies of language model memorization is how to filter out ``common'' memorization. In fact, most memorization criteria strongly correlate with the number of occurrences in the training set, capturing memorized familiar phrases, public knowledge, templated texts, or other repeated data.We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training.We identify and study counterfactually-memorized training examples in standard text datasets.We estimate the influence of each memorized training example on the validation set and on generated texts, showing how this can provide direct evidence of the source of memorization at test time.
Author Information
Chiyuan Zhang (MIT)
Daphne Ippolito (Carnegie Mellon University)
Katherine Lee (Google DeepMind)
Matthew Jagielski (Google)
Florian Tramer (ETH Zürich)
Nicholas Carlini (Google)
More from the Same Authors
-
2021 : Randomized Response with Prior and Applications to Learning with Label Differential Privacy »
Badih Ghazi · Noah Golowich · Ravi Kumar · Pasin Manurangsi · Chiyuan Zhang -
2023 : Backdoor Attacks for In-Context Learning with Language Models »
Nikhil Kandpal · Matthew Jagielski · Florian Tramer · Nicholas Carlini -
2023 : On the Reproducibility of Data Valuation under Learning Stochasticity »
Jiachen Wang · Feiyang Kang · Chiyuan Zhang · Ruoxi Jia · Prateek Mittal -
2023 : Privacy Auditing with One (1) Training Run »
Thomas Steinke · Milad Nasresfahani · Matthew Jagielski -
2023 Workshop: Generative AI and Law (GenLaw) »
Katherine Lee · A. Feder Cooper · FatemehSadat Mireshghallah · Madiha Zahrah · James Grimmelmann · David Mimno · Deep Ganguli · Ludwig Schubert -
2023 : Panel Discussion »
Peter Kairouz · Song Han · Kamalika Chaudhuri · Florian Tramer -
2023 Workshop: Challenges in Deployable Generative AI »
Swami Sankaranarayanan · Thomas Hartvigsen · Camille Bilodeau · Ryutaro Tanno · Cheng Zhang · Florian Tramer · Phillip Isola -
2023 Poster: Can Neural Network Memorization Be Localized? »
Pratyush Maini · Michael Mozer · Hanie Sedghi · Zachary Lipton · Zico Kolter · Chiyuan Zhang -
2023 Poster: On User-Level Private Convex Optimization »
Badih Ghazi · Pritish Kamath · Ravi Kumar · Pasin Manurangsi · Raghu Meka · Chiyuan Zhang -
2022 Workshop: Knowledge Retrieval and Language Models »
Maithra Raghu · Urvashi Khandelwal · Chiyuan Zhang · Matei Zaharia · Alexander Rush -
2021 Poster: Understanding Invariance via Feedforward Inversion of Discriminatively Trained Classifiers »
Piotr Teterwak · Chiyuan Zhang · Dilip Krishnan · Michael Mozer -
2021 Poster: Characterizing Structural Regularities of Labeled Data in Overparameterized Models »
Ziheng Jiang · Chiyuan Zhang · Kunal Talwar · Michael Mozer -
2021 Spotlight: Understanding Invariance via Feedforward Inversion of Discriminatively Trained Classifiers »
Piotr Teterwak · Chiyuan Zhang · Dilip Krishnan · Michael Mozer -
2021 Oral: Characterizing Structural Regularities of Labeled Data in Overparameterized Models »
Ziheng Jiang · Chiyuan Zhang · Kunal Talwar · Michael Mozer -
2020 Poster: Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations »
Florian Tramer · Jens Behrmann · Nicholas Carlini · Nicolas Papernot · Joern-Henrik Jacobsen -
2019 Poster: Adversarial Examples Are a Natural Consequence of Test Error in Noise »
Justin Gilmer · Nicolas Ford · Nicholas Carlini · Ekin Dogus Cubuk -
2019 Poster: Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition »
Yao Qin · Nicholas Carlini · Garrison Cottrell · Ian Goodfellow · Colin Raffel -
2019 Oral: Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition »
Yao Qin · Nicholas Carlini · Garrison Cottrell · Ian Goodfellow · Colin Raffel -
2019 Oral: Adversarial Examples Are a Natural Consequence of Test Error in Noise »
Justin Gilmer · Nicolas Ford · Nicholas Carlini · Ekin Dogus Cubuk -
2018 Poster: Machine Theory of Mind »
Neil Rabinowitz · Frank Perbet · Francis Song · Chiyuan Zhang · S. M. Ali Eslami · Matthew Botvinick -
2018 Oral: Machine Theory of Mind »
Neil Rabinowitz · Frank Perbet · Francis Song · Chiyuan Zhang · S. M. Ali Eslami · Matthew Botvinick