Skip to yearly menu bar Skip to main content

Workshop: A Blessing in Disguise: The Prospects and Perils of Adversarial Machine Learning

Defending against Model Stealing via Verifying Embedded External Features

Linghui Zhu · Yiming Li · Xiaojun Jia · Yong Jiang · Shutao Xia · Xiaochun Cao


Well-trained models are valuable intellectual properties for their owners. Recent studies revealed that the adversaries can `steal' deployed models even when they have no training sample and can only query the model. Currently, there were some defense methods to alleviate this threat, mostly by increasing the cost of model stealing. In this paper, we explore the defense problem from another angle by \emph{verifying whether a suspicious model contains the knowledge of defender-specified external features}. We embed the \emph{external features} by \emph{poisoning} a few training samples via style transfer. After that, we train a meta-classifier, based on the gradient of predictions, to determine whether a suspicious model is stolen from the victim. Our method is inspired by the understanding that the stolen models should contain the knowledge of (external) features learned by the victim model. Experimental results demonstrate that our approach is effective in defending against different model stealing attacks simultaneously.