Poster
in
Workshop: A Blessing in Disguise: The Prospects and Perils of Adversarial Machine Learning
Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them
Florian Tramer
Abstract:
Making classifiers robust to adversarial examples is hard.
Thus, many defenses tackle the seemingly easier task of \emph{detecting} perturbed inputs.
We show a barrier towards this goal. We prove a general \emph{hardness reduction} between detection and classification of adversarial examples: given a robust detector for attacks at distance (in some metric), we can build a similarly robust (but inefficient) \emph{classifier} for attacks at distance .
Our reduction is computationally inefficient, and thus cannot be used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated.
%(indeed, building inefficient robust classifiers is also presumed to be very challenging).
To illustrate, we revisit detector defenses. For cases, we show that the claimed detection results would imply an inefficient classifier with robustness far beyond the state-of-the-art.