Oral
in
Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning

A Theoretical Perspective on the Robustness of Feature Extractors

Keywords: deep neural networks Adversarial Robustness feature extractors lower bounds

Project Page [ OpenReview]

Abstract

Recent theoretical work on robustness to adversarial examples has derived lower bounds on how robust any model can be when the distribution and adversarial constraints are specified. However, these bounds do not account for the specific models used in practice, such as neural networks. In this paper, we develop a methodology to analyze the fundamental limits on the robustness of fixed feature extractors, which in turn provides bounds on the robustness of classifiers trained on top of them. The tightness of these bounds relies on the effectiveness of the method used to find collisions between pairs of perturbed examples at deeper layers. For linear feature extractors, we provide closed-form expressions for collision finding while for piece-wise linear feature extractors, we propose a bespoke algorithm based on the iterative solution of a convex program that provably finds collisions. We utilize our bounds to identify structural features of classifiers that lead to a lack of robustness and provide insights into the effectiveness of different training methods at obtaining robust feature extractors.

Video

Chat is not available.