Timezone: »
Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them
Florian Tramer
Making classifiers robust to adversarial examples is hard.
Thus, many defenses tackle the seemingly easier task of \emph{detecting} perturbed inputs.
We show a barrier towards this goal. We prove a general \emph{hardness reduction} between detection and classification of adversarial examples: given a robust detector for attacks at distance $\epsilon$ (in some metric), we can build a similarly robust (but inefficient) \emph{classifier} for attacks at distance $\epsilon/2$.
Our reduction is computationally inefficient, and thus cannot be used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated.
%(indeed, building inefficient robust classifiers is also presumed to be very challenging).
To illustrate, we revisit $13$ detector defenses. For $10/13$ cases, we show that the claimed detection results would imply an inefficient classifier with robustness far beyond the state-of-the-art.
Author Information
Florian Tramer (Stanford University)
More from the Same Authors
-
2021 : Data Poisoning Won’t Save You From Facial Recognition »
Evani Radiya-Dixit · Florian Tramer -
2023 : Backdoor Attacks for In-Context Learning with Language Models »
Nikhil Kandpal · Matthew Jagielski · Florian Tramer · Nicholas Carlini -
2023 : Evading Black-box Classifiers Without Breaking Eggs »
Edoardo Debenedetti · Nicholas Carlini · Florian Tramer -
2023 Poster: Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems »
Chawin Sitawarin · Florian Tramer · Nicholas Carlini -
2022 Poster: Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them »
Florian Tramer -
2022 Oral: Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them »
Florian Tramer -
2021 : Contributed Talk #4 »
Florian Tramer -
2021 Poster: Label-Only Membership Inference Attacks »
Christopher Choquette-Choo · Florian Tramer · Nicholas Carlini · Nicolas Papernot -
2021 Spotlight: Label-Only Membership Inference Attacks »
Christopher Choquette-Choo · Florian Tramer · Nicholas Carlini · Nicolas Papernot -
2020 Poster: Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations »
Florian Tramer · Jens Behrmann · Nicholas Carlini · Nicolas Papernot · Joern-Henrik Jacobsen -
2019 Workshop: Workshop on the Security and Privacy of Machine Learning »
Nicolas Papernot · Florian Tramer · Bo Li · Dan Boneh · David Evans · Somesh Jha · Percy Liang · Patrick McDaniel · Jacob Steinhardt · Dawn Song