Skip to yearly menu bar Skip to main content


Transferable Clean-Label Poisoning Attacks on Deep Neural Nets

Chen Zhu · W. Ronny Huang · Hengduo Li · Gavin Taylor · Christoph Studer · Tom Goldstein

Pacific Ballroom #68

Keywords: [ Interpretability ] [ Adversarial Examples ]


In this paper, we explore clean-label poisoning attacks on deep convolutional networks with access to neither the network's output nor its architecture or parameters. Our goal is to ensure that after injecting the poisons into the training data, a model with unknown architecture and parameters trained on that data will misclassify the target image into a specific class. To achieve this goal, we generate multiple poison images from the base class by adding small perturbations which cause the poison images to trap the target image within their convex polytope in feature space. We also demonstrate that using Dropout during crafting of the poisons and enforcing this objective in multiple layers enhances transferability, enabling attacks against both the transfer learning and end-to-end training settings. We demonstrate transferable attack success rates of over 50% by poisoning only 1% of the training set.

Live content is unavailable. Log in and register to view live content