Poster Tue, Jul 7, 2026 • 10:30 AM – 12:15 PM KST Coex: HALL A

Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data

Ahmed Mehdi Inane ⋅ Vincent Quirion ⋅ Gintare Karolina Dziugaite ⋅ Ioannis Mitliagkas

Abstract

Noise-based certified machine unlearning currently faces a hard ceiling: the noise magnitude required to certify unlearning typically destroys model utility, particularly for large-scale deletion requests. While leveraging public data is a standard technique in differential privacy to relax this tension, its role in unlearning remains unexplored. We address this gap by introducing **Asymmetric Langevin Unlearning (ALU)**, a framework that uses public data to mitigate privacy costs. We prove that public data injection suppresses the unlearning cost by a factor of $O(1/n_{\mathrm{pub}}^2)$, guaranteeing a strict computational advantage over retraining. This establishes a new control mechanism: practitioners can mitigate the need for high noise—and the associated utility loss—by increasing the volume of public data. Crucially, we analyze the realistic setting of **distribution mismatch**, explicitly characterizing how shifts between public and private sources impact utility. We show that ALU enables "mass unlearning'' of constant dataset fractions -- a regime where standard symmetric methods become impractical -- while maintaining high utility. Empirical evaluations using variational Rényi divergence and membership inference attacks confirm that ALU effectively thwarts privacy attacks while preserving utility under reasonable distribution shifts.