Skip to yearly menu bar Skip to main content


Oral
in
Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning

A Theoretical Perspective on the Robustness of Feature Extractors

Keywords: [ lower bounds ] [ feature extractors ] [ Adversarial Robustness ] [ deep neural networks ]


Abstract:

Recent theoretical work on robustness to adversarial examples has derived lower bounds on how robust any model can be when the distribution and adversarial constraints are specified. However, these bounds do not account for the specific models used in practice, such as neural networks. In this paper, we develop a methodology to analyze the fundamental limits on the robustness of fixed feature extractors, which in turn provides bounds on the robustness of classifiers trained on top of them. The tightness of these bounds relies on the effectiveness of the method used to find collisions between pairs of perturbed examples at deeper layers. For linear feature extractors, we provide closed-form expressions for collision finding while for piece-wise linear feature extractors, we propose a bespoke algorithm based on the iterative solution of a convex program that provably finds collisions. We utilize our bounds to identify structural features of classifiers that lead to a lack of robustness and provide insights into the effectiveness of different training methods at obtaining robust feature extractors.

Chat is not available.