Timezone: »
Model extraction attacks attempt to replicate a target machine learning model by querying its inference API. State-of-the-art attacks are learning-based and construct replicas by supervised training on the target model's predictions, but an emerging class of attacks exploit algebraic properties to obtain high-fidelity replicas using orders of magnitude fewer queries. So far, these algebraic attacks have been limited to neural networks with few hidden layers and ReLU activations. In this paper we present algebraic and hybrid algebraic/learning-based attacks on large-scale natural language models. We consider a grey-box setting, targeting models with a pre-trained (public) encoder followed by a single (private) classification layer. Our key findings are that (i) with a frozen encoder, high-fidelity extraction is possible with a small number of in-distribution queries, making extraction attacks indistinguishable from legitimate use; (ii) when the encoder is fine-tuned, a hybrid learning-based/algebraic attack improves over the learning-based state-of-the-art without requiring additional queries.
Author Information
Santiago Zanella-Beguelin (Microsoft Research)

I am a member of the Confidential AI team at Microsoft Research Cambridge, working on Security and Privacy of Machine Learning systems. Previously, I worked in the Constructive Security, and Programming Principles and Tools teams, most notably on Project Everest, building secure implementation of key components of the HTTPS ecosystem. Before that, I held a Research Engineer position at Inria Paris, and Post-Doctoral positions at Microsoft Research and IMDEA Software. I got my PhD from École Nationale Supérieure des Mines de Paris while working at Inria Sophia Antipolis-Méditerranée on the formal verification of game-based proofs of security in cryptography.
Shruti Tople (Microsoft Research)
Andrew Paverd (Microsoft Research)
Boris Köpf (Microsoft Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Grey-box Extraction of Natural Language Models »
Fri. Jul 23rd 12:45 -- 12:50 AM Room
More from the Same Authors
-
2023 Poster: Bayesian Estimation of Differential Privacy »
Santiago Zanella-Beguelin · Lukas Wutschitz · Shruti Tople · Ahmed Salem · Victor Ruehle · Andrew Paverd · Mohammad Naseri · Boris Köpf · Dan Jones -
2021 Poster: Domain Generalization using Causal Matching »
Divyat Mahajan · Shruti Tople · Amit Sharma -
2021 Oral: Domain Generalization using Causal Matching »
Divyat Mahajan · Shruti Tople · Amit Sharma -
2020 Poster: Alleviating Privacy Attacks via Causal Learning »
Shruti Tople · Amit Sharma · Aditya Nori