Timezone: »
Adversarial examples are malicious inputs crafted to induce misclassification. Commonly studied \emph{sensitivity-based} adversarial examples introduce semantically-small changes to an input that result in a different model prediction. This paper studies a complementary failure mode, \emph{invariance-based} adversarial examples, that introduce minimal semantic changes that modify an input's true label yet preserve the model's prediction. We demonstrate fundamental tradeoffs between these two types of adversarial examples. We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks, and that new approaches are needed to resist both attack types. In particular, we break state-of-the-art adversarially-trained and \emph{certifiably-robust} models by generating small perturbations that the models are (provably) robust to, yet that change an input's class according to human labelers. Finally, we formally show that the existence of excessively invariant classifiers arises from the presence of \emph{overly-robust} predictive features in standard datasets.
Author Information
Florian Tramer (Stanford University)
Jens Behrmann (University of Bremen)
Nicholas Carlini (Google)
Nicolas Papernot (University of Toronto and Vector Institute)
Joern-Henrik Jacobsen (Apple Inc.)
More from the Same Authors
-
2021 : Data Poisoning Won’t Save You From Facial Recognition »
Evani Radiya-Dixit · Florian Tramer -
2021 : Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them »
Florian Tramer -
2023 : Backdoor Attacks for In-Context Learning with Language Models »
Nikhil Kandpal · Matthew Jagielski · Florian Tramer · Nicholas Carlini -
2023 : Counterfactual Memorization in Neural Language Models »
Chiyuan Zhang · Daphne Ippolito · Katherine Lee · Matthew Jagielski · Florian Tramer · Nicholas Carlini -
2023 : Evading Black-box Classifiers Without Breaking Eggs »
Edoardo Debenedetti · Nicholas Carlini · Florian Tramer -
2023 Poster: Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems »
Chawin Sitawarin · Florian Tramer · Nicholas Carlini -
2022 Poster: On the Difficulty of Defending Self-Supervised Learning against Model Extraction »
Adam Dziedzic · Nikita Dhawan · Muhammad Ahmad Kaleem · Jonas Guan · Nicolas Papernot -
2022 Spotlight: On the Difficulty of Defending Self-Supervised Learning against Model Extraction »
Adam Dziedzic · Nikita Dhawan · Muhammad Ahmad Kaleem · Jonas Guan · Nicolas Papernot -
2022 Poster: Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them »
Florian Tramer -
2022 Oral: Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them »
Florian Tramer -
2021 : Contributed Talk #4 »
Florian Tramer -
2021 Poster: Markpainting: Adversarial Machine Learning meets Inpainting »
David G Khachaturov · Ilia Shumailov · Yiren Zhao · Nicolas Papernot · Ross Anderson -
2021 Poster: Label-Only Membership Inference Attacks »
Christopher Choquette-Choo · Florian Tramer · Nicholas Carlini · Nicolas Papernot -
2021 Poster: Environment Inference for Invariant Learning »
Elliot Creager · Joern-Henrik Jacobsen · Richard Zemel -
2021 Spotlight: Label-Only Membership Inference Attacks »
Christopher Choquette-Choo · Florian Tramer · Nicholas Carlini · Nicolas Papernot -
2021 Spotlight: Environment Inference for Invariant Learning »
Elliot Creager · Joern-Henrik Jacobsen · Richard Zemel -
2021 Spotlight: Markpainting: Adversarial Machine Learning meets Inpainting »
David G Khachaturov · Ilia Shumailov · Yiren Zhao · Nicolas Papernot · Ross Anderson -
2021 Poster: Out-of-Distribution Generalization via Risk Extrapolation (REx) »
David Krueger · Ethan Caballero · Joern-Henrik Jacobsen · Amy Zhang · Jonathan Binas · Dinghuai Zhang · Remi Le Priol · Aaron Courville -
2021 Oral: Out-of-Distribution Generalization via Risk Extrapolation (REx) »
David Krueger · Ethan Caballero · Joern-Henrik Jacobsen · Amy Zhang · Jonathan Binas · Dinghuai Zhang · Remi Le Priol · Aaron Courville -
2020 : Panel 1 »
Deborah Raji · Tawana Petty · Nicolas Papernot · Piotr Sapiezynski · Aleksandra Korolova -
2020 : What does it mean for ML to be trustworthy? »
Nicolas Papernot -
2020 Poster: How to Train Your Neural ODE: the World of Jacobian and Kinetic Regularization »
Chris Finlay · Joern-Henrik Jacobsen · Levon Nurbekyan · Adam Oberman -
2020 Poster: Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling »
Will Grathwohl · Kuan-Chieh Wang · Joern-Henrik Jacobsen · David Duvenaud · Richard Zemel -
2019 : Invertible Residual Networks and a Novel Perspective on Adversarial Examples »
Joern-Henrik Jacobsen -
2019 Workshop: Workshop on the Security and Privacy of Machine Learning »
Nicolas Papernot · Florian Tramer · Bo Li · Dan Boneh · David Evans · Somesh Jha · Percy Liang · Patrick McDaniel · Jacob Steinhardt · Dawn Song -
2019 Poster: Flexibly Fair Representation Learning by Disentanglement »
Elliot Creager · David Madras · Joern-Henrik Jacobsen · Marissa Weis · Kevin Swersky · Toniann Pitassi · Richard Zemel -
2019 Oral: Flexibly Fair Representation Learning by Disentanglement »
Elliot Creager · David Madras · Joern-Henrik Jacobsen · Marissa Weis · Kevin Swersky · Toniann Pitassi · Richard Zemel -
2019 Poster: Adversarial Examples Are a Natural Consequence of Test Error in Noise »
Justin Gilmer · Nicolas Ford · Nicholas Carlini · Ekin Dogus Cubuk -
2019 Poster: Invertible Residual Networks »
Jens Behrmann · Will Grathwohl · Ricky T. Q. Chen · David Duvenaud · Joern-Henrik Jacobsen -
2019 Poster: Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition »
Yao Qin · Nicholas Carlini · Garrison Cottrell · Ian Goodfellow · Colin Raffel -
2019 Oral: Invertible Residual Networks »
Jens Behrmann · Will Grathwohl · Ricky T. Q. Chen · David Duvenaud · Joern-Henrik Jacobsen -
2019 Oral: Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition »
Yao Qin · Nicholas Carlini · Garrison Cottrell · Ian Goodfellow · Colin Raffel -
2019 Oral: Adversarial Examples Are a Natural Consequence of Test Error in Noise »
Justin Gilmer · Nicolas Ford · Nicholas Carlini · Ekin Dogus Cubuk