Timezone: »

Why adversarial training can hurt robust accuracy
jacob clarysse · Julia Hörrmann · Fanny Yang

Machine learning classifiers with high test accuracy often perform poorly under adversarial attacks. It is commonly believed that adversarial training alleviates this issue. In this paper, we demonstrate that, surprisingly, the opposite can be true for a natural class of perceptible perturbations --- even though adversarial training helps when enough data is available, it may in fact hurt robust generalization in the small sample size regime. We first prove this phenomenon for a high-dimensional linear classification setting with noiseless observations. Using intuitive insights from the proof, we could surprisingly find perturbations on standard image datasets for which this behavior persists. Specifically, it occurs for perceptible attacks that effectively reduce class information such as object occlusions or corruptions.

Author Information

jacob clarysse (ETH Zürich)
Julia Hörrmann (ETH Zurich)
Fanny Yang (ETH Zurich)

More from the Same Authors