Unlearning Isn’t Forgetting: Revealing Hidden Leakage in Class Unlearning Evaluations
Ali Ebrahimpour-Boroojeny ⋅ Yian Wang ⋅ Hari Sundaram
Abstract
In this paper, we reveal a significant shortcoming in class unlearning evaluations: overlooking the underlying class geometry can cause information leakage about the forgotten class. We further propose a simple unlearning strategy to mitigate this issue. We introduce Class Membership Inference Attack (CMIA) that uses the probabilities the model assigns to neighboring classes to detect unlearned samples. We find that existing unlearning methods are vulnerable to CMIA across multiple datasets. We then propose a new fine-tuning objective that mitigates this privacy leakage by approximating, for forget-class inputs, the distribution over the remaining classes that a retrained-from-scratch model would produce. To construct this approximation, we estimate inter-class similarity and tilt the target model’s distribution accordingly. The resulting Tilted REWeighting (TREW) distribution serves as the desired distribution during fine-tuning. We also show that across multiple benchmarks, TREW matches or surpasses existing unlearning methods on prior unlearning metrics. More specifically, on CIFAR-10, it reduces the gap with retrained models by $19\%$ and $46\%$ for U-LiRA and CMIA scores, accordingly, compared to the SOTA method for each category.
Successful Page Load