Poster
in
Workshop: Next Generation of AI Safety
Is My Data Safe? Predicting Membership Inference Success for Individual Instances
Tobias Leemann · Bardh Prenkaj · Gjergji Kasneci
Keywords: [ Privacy ] [ membership inference ]
We perform an extensive empirical investigation of three recent membership inference (MI) attacks on vision and language models. Our investigation includes the newly proposed Gradient Likelihood Ratio (GLiR) attack, a white-box attack with theoretical optimality guarantees. Prior research has suggested that white-box attacks cannot outperform black-box MI attacks. In this work, we challenge this hypothesis by running and evaluating this attack on real-world models with up to 53M parameters for the first time. We find that this white-box attack does indeed have the potential to outperform other attacks. We subsequently focus on the problem of MI susceptibility prediction, which is concerned with efficiently identifying individuals who are most susceptible to attack risk à priori. Doing so, we uncover which characteristics make instances susceptible to MI, and whether the targeted instances are the same across attacks. We implement and study over 20 predictors of attack success. We find that GLiR mostly targets the same points as loss-based attacks and that the vulnerable instances can be efficiently predicted.