Poster
in
Workshop: ICML 2024 Workshop on Foundation Models in the Wild

On the Privacy Risks of Post-Hoc Explanations of Foundation Models

Catherine Huang · Martin Pawelczyk · Himabindu Lakkaraju

Keywords: explainability Privacy Membership Inference Attacks Adversarial ML Vision Transformers

Project Page [ OpenReview]

Abstract

Foundation models are becoming increasingly deployed in high-stakes contexts in fields such as medicine, finance, and law. In these contexts, there is a trade-off between model explainability and data privacy: explainability promotes transparency, and privacy is a limit on transparency. In this work, we push the boundaries of this trade-off: we reveal that post-hoc feature attribution explanations beget unforeseen privacy risks upon the fine-tuning data of vision transformer models. We construct VAR-LRT and L1/L2-LRT, two new membership inference attacks leveraging feature attribution explanations that are significantly more successful than existing explanation-leveraging attacks, particularly in the low false-positive rate regime that allows an adversary to identify specific fine-tuning dataset members with high confidence. We carry out a systematic empirical investigation of our 2 new attacks with 5 vision transformer architectures, 5 benchmark datasets, and 4 state-of-the-art post-hoc explanation methods. Our work addresses the lack of trust in post-hoc explanation methods that has contributed to the slow adoption of foundation models in high-stakes domains.

Chat is not available.