ICML Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

Poster
in
Workshop: A Blessing in Disguise: The Prospects and Perils of Adversarial Machine Learning

Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

Florian Tramer

[ Abstract ]

[ Visit Poster at Spot B4 in Virtual World ]

Abstract: Making classifiers robust to adversarial examples is hard. Thus, many defenses tackle the seemingly easier task of \emph{detecting} perturbed inputs. We show a barrier towards this goal. We prove a general \emph{hardness reduction} between detection and classification of adversarial examples: given a robust detector for attacks at distance

ϵ

$\epsilon$ (in some metric), we can build a similarly robust (but inefficient) \emph{classifier} for attacks at distance

ϵ / 2

$\epsilon/2$ . Our reduction is computationally inefficient, and thus cannot be used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated. %(indeed, building inefficient robust classifiers is also presumed to be very challenging). To illustrate, we revisit

13

$13$ detector defenses. For

10 / 13

$10/13$ cases, we show that the claimed detection results would imply an inefficient classifier with robustness far beyond the state-of-the-art.

Poster in Workshop: A Blessing in Disguise: The Prospects and Perils of Adversarial Machine Learning

Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

Florian Tramer

Poster
in
Workshop: A Blessing in Disguise: The Prospects and Perils of Adversarial Machine Learning