We propose a method to learn deep ReLUbased classifiers that are provably robust against normbounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some nonadversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a normbounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any normbounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded $\ell_\infty$ norm less than $\epsilon = 0.1$).
Author Information
Eric Wong (Carnegie Mellon University)
Zico Kolter (Carnegie Mellon University / Bosch Center for AI)
Related Events (a corresponding poster, oral, or spotlight)

2018 Poster: Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope »
Wed Jul 11th 06:15  09:00 PM Room Hall B
More from the Same Authors

2019 Poster: Certified Adversarial Robustness via Randomized Smoothing »
Jeremy Cohen · Elan Rosenfeld · Zico Kolter 
2019 Poster: Wasserstein Adversarial Examples via Projected Sinkhorn Iterations »
Eric Wong · Frank Schmidt · Zico Kolter 
2019 Oral: Wasserstein Adversarial Examples via Projected Sinkhorn Iterations »
Eric Wong · Frank Schmidt · Zico Kolter 
2019 Oral: Certified Adversarial Robustness via Randomized Smoothing »
Jeremy Cohen · Elan Rosenfeld · Zico Kolter 
2019 Poster: SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver »
PoWei Wang · Priya Donti · Bryan Wilder · Zico Kolter 
2019 Poster: Adversarial camera stickers: A physical camerabased attack on deep learning systems »
Juncheng Li · Frank Schmidt · Zico Kolter 
2019 Oral: SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver »
PoWei Wang · Priya Donti · Bryan Wilder · Zico Kolter 
2019 Oral: Adversarial camera stickers: A physical camerabased attack on deep learning systems »
Juncheng Li · Frank Schmidt · Zico Kolter 
2017 Poster: Input Convex Neural Networks »
Brandon Amos · Lei Xu · Zico Kolter 
2017 Poster: OptNet: Differentiable Optimization as a Layer in Neural Networks »
Brandon Amos · Zico Kolter 
2017 Poster: A Semismooth Newton Method for Fast, Generic Convex Programming »
Alnur Ali · Eric Wong · Zico Kolter 
2017 Talk: OptNet: Differentiable Optimization as a Layer in Neural Networks »
Brandon Amos · Zico Kolter 
2017 Talk: Input Convex Neural Networks »
Brandon Amos · Lei Xu · Zico Kolter 
2017 Talk: A Semismooth Newton Method for Fast, Generic Convex Programming »
Alnur Ali · Eric Wong · Zico Kolter