OptiFluence: Principled Design of Privacy Canaries
Mohammad Yaghini ⋅ Michael Aerni ⋅ Junrui Zhang ⋅ Nicolas Papernot ⋅ Florian Tramer
Abstract
Privacy auditing has emerged as a practical tool for empirically estimating training data leakage in machine learning models, in contrast to the provable but often overly pessimistic bounds provided by differential privacy analysis. A common strategy is to use membership inference attacks to detect the presence of specific canaries—data points chosen to maximize attack success—in training data. However, existing canary designs are largely heuristic, relying on mislabeled or out-of-distribution samples. We address this gap by formulating canary design as a bilevel optimization problem, where the model is trained in the inner loop and the canary is optimized in the outer loop to maximize its detectability. To solve this problem, we develop OptiFluence, a scalable optimization framework that combines (i) initialization by selecting candidates using influence functions and (ii) unrolled optimization with memory-efficient techniques. Our approach achieves remarkable empirical performance on four datasets. Optimized canaries demonstrate 415$\times$ (CIFAR-10/100) higher detectability than in-distribution baselines, achieving near-perfect detection rates of 99% true positive rate at 0.1% false positive rate. Critically, these canaries transfer effectively across different model architectures without retraining, enabling practical third-party privacy audits. This transferability allows regulators and auditors to assess model privacy without requiring access to proprietary training infrastructure or substantial computational resources.
Successful Page Load