Timezone: »

Kamalika Chaudhuri
Kamalika Chaudhuri

Bio: Kamalika Chaudhuri is a Professor in the department of Computer Science and Engineering at University of California San Diego, and a Research Scientist in the FAIR team at Meta AI. Her research interests are in the foundations of trustworthy machine learning, which includes problems such as learning from sensitive data while preserving privacy, learning under sampling bias, and in the presence of an adversary. She is particularly interested in privacy-preserving machine learning, which addresses how to learn good models and predictors from sensitive data, while preserving the privacy of individuals.

Title: Do SSL Models Have Déjà Vu? A Case of Unintended Memorization in Self-supervised Learning

Abstract: Self-supervised learning (SSL) algorithms can produce useful image representations by learning to associate different parts of natural images with one another. However, when taken to the extreme, SSL models can unintendedly memorize specific parts in individual training samples rather than learning semantically meaningful associations. In this work, we perform a systematic study of the unintended memorization of image-specific information in SSL models -- which we refer to as déjà vu memorization. Concretely, we show that given the trained model and a crop of a training image containing only the background (e.g., water, sky, grass), it is possible to infer the foreground object with high accuracy or even visually reconstruct it. Furthermore, we show that déjà vu memorization is common to different SSL algorithms, is exacerbated by certain design choices, and cannot be detected by conventional techniques for evaluating representation quality. Our study of déjà vu memorization reveals previously unknown privacy risks in SSL models, as well as suggests potential practical mitigation strategies.

Author Information

Kamalika Chaudhuri (UCSD, Meta AI Research, and FAIR)

More from the Same Authors