Session
Deep Learning (Adversarial) 1
Differentiable Abstract Interpretation for Provably Robust Neural Networks
Matthew Mirman · Timon Gehr · Martin Vechev
We introduce a scalable method for training neural networks based on abstract interpretation. We show how to successfully apply an approximate end-to-end differentiable abstract interpreter to train large networks that are (i) certifiably more robust to adversarial perturbations, and (ii) have improved accuracy.
Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope
Eric Wong · Zico Kolter
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded $\ell_\infty$ norm less than $\epsilon = 0.1$).
Synthesizing Robust Adversarial Examples
Anish Athalye · Logan Engstrom · Andrew Ilyas · Kevin Kwok
Standard methods for generating adversarial examples for neural networks do not consistently fool neural network classifiers in the physical world due to a combination of viewpoint shifts, camera noise, and other natural transformations, limiting their relevance to real-world systems. We demonstrate the existence of robust 3D adversarial objects, and we present the first algorithm for synthesizing examples that are adversarial over a chosen distribution of transformations. We synthesize two-dimensional adversarial images that are robust to noise, distortion, and affine transformation. We apply our algorithm to complex three-dimensional objects, using 3D-printing to manufacture the first physical adversarial objects. Our results demonstrate the existence of 3D adversarial objects in the physical world.
Adversarial Risk and the Dangers of Evaluating Against Weak Attacks
Jonathan Uesato · Brendan O'Donoghue · Pushmeet Kohli · AƤron van den Oord
This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate \emph{adversarial risk} as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optimize this surrogate rather than the true adversarial risk. We formalize this notion as \textit{obscurity to an adversary}, and develop tools and heuristics for identifying obscured models and designing transparent models. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that our formulations and results will help researchers to develop more powerful defenses.