Session
Adversarial Examples
Adversarial Attacks on Node Embeddings via Graph Poisoning
Aleksandar Bojchevski · Stephan Günnemann
The goal of network representation learning is to learn low-dimensional node embeddings that capture the graph structure and are useful for solving downstream tasks. However, despite the proliferation of such methods, there is currently no study of their robustness to adversarial attacks. We provide the first adversarial vulnerability analysis on the widely used family of methods based on random walks. We derive efficient adversarial perturbations that poison the network structure and have a negative effect on both the quality of the embeddings and the downstream tasks. We further show that our attacks are transferable since they generalize to many models, and are successful even when the attacker is restricted. The code and the data is provided in the supplementary material.
First-Order Adversarial Vulnerability of Neural Networks and Input Dimension
Carl-Johann Simon-Gabriel · Yann Ollivier · Leon Bottou · Bernhard Schölkopf · David Lopez-Paz
Over the past few years, neural networks have been proven vulnerable to adversarial images: targeted but imperceptible image perturbations lead to drastically different predictions. We show that adversarial vulnerability increases with the gradients of the training objective when viewed as a function of the inputs. Surprisingly, vulnerability does not depend on network topology: for most current network architectures, we prove that at initialization, the L1-norm of these gradients grows as the square root of the input dimension, leaving the networks increasingly vulnerable with growing image size. We empirically show that this dimension-dependence persists after either usual or robust training, but gets attenuated with higher regularization.
On Certifying Non-Uniform Bounds against Adversarial Attacks
Chen Liu · Ryota Tomioka · Volkan Cevher
This work studies the robustness certification problem of neural network models, which aims to find certified adversary-free regions as large as possible around data points. In contrast to the existing approaches that seek regions bounded uniformly along all input features, we consider non-uniform bounds and use it to study the decision boundary of neural network models. We formulate our target as an optimization problem with nonlinear constraints. Then, a framework applicable for general feedforward neural networks is proposed to bound the output logits so that the relaxed problem can be solved by the augmented Lagrangian method. Our experiments show the non-uniform bounds have larger volumes than uniform ones. Compared with normal models, the robust models have even larger non-uniform bounds and better interpretability. Further, the geometric similarity of the non-uniform bounds gives a quantitative, data-agnostic metric of input features' robustness.
Improving Adversarial Robustness via Promoting Ensemble Diversity
Tianyu Pang · Kun Xu · Chao Du · Ning Chen · Jun Zhu
Though deep neural networks have achieved significant progress on various tasks, often enhanced by model ensemble, existing high-performance models can be vulnerable to adversarial attacks. Many efforts have been devoted to enhancing the robustness of individual networks and then constructing a straightforward ensemble, e.g., by directly averaging the outputs, which ignores the interaction among networks. This paper presents a new method that explores the interaction among individual networks to improve robustness for ensemble models. Technically, we define a new notion of ensemble diversity in the adversarial setting as the diversity among non-maximal predictions of individual members, and present an adaptive diversity promoting (ADP) regularizer to encourage the diversity, which leads to globally better robustness for the ensemble by making adversarial examples difficult to transfer among individual members. Our method is computationally efficient and compatible with the defense methods acting on individual networks. Empirical results on various datasets verify that our method can improve adversarial robustness while maintaining state-of-the-art accuracy on normal examples.
Adversarial camera stickers: A physical camera-based attack on deep learning systems
Juncheng Li · Frank R Schmidt · Zico Kolter
Recent work has thoroughly documented the susceptibility of deep learning systems to adversarial examples, but most such instances directly manipulate the digital input to a classifier. Although a smaller line of work has considered physical adversarial attacks, in all cases these involve manipulating the object of interest, i.e., putting a physical sticker on a object to misclassify it, or manufacturing an object specifically intended to be misclassified. In this work we consider an alternative question: is it possible to fool deep classifiers, over all perceived objects of a certain type, by physically manipulating the camera itself? We show that this is indeed possible, that by placing a carefully crafted and mainly-translucent sticker over the lens of a camera, one can create universal perturbations of the observed images that are inconspicuous, yet reliably misclassify target objects as a different (targeted) class. To accomplish this, we propose an iterative procedure for both updating the attack perturbation (to make it adversarial for a given classifier), and the threat model itself (to ensure it is physically realizable). For example, we show that we can achieve physically-realizable attacks that fool ImageNet classifiers in a targeted fashion 49.6\% of the time. This presents a new class of physically-realizable threat models to consider in the context of adversarially robust machine learning.
Adversarial examples from computational constraints
Sebastien Bubeck · Yin Tat Lee · Eric Price · Ilya Razenshteyn
Why are classifiers in high dimension vulnerable to “adversarial” perturbations? We show that it is likely not due to information theoretic limitations, but rather it could be due to computational constraints.
First we prove that, for a broad set of classification tasks, the mere existence of a robust classifier implies that it can be found by a possibly exponential-time algorithm with relatively few training examples. Then we give two particular classification tasks where learning a robust classifier is computationally intractable. More precisely we construct two binary classifications task in high dimensional space which are (i) information theoretically easy to learn robustly for large perturbations, (ii) efficiently learnable (non-robustly) by a simple linear separator, (iii) yet are not efficiently robustly learnable, even for small perturbations. Specifically, for the first task hardness holds for any efficient algorithm in the statistical query (SQ) model, while for the second task we rule out any efficient algorithm under a cryptographic assumption. These examples give an exponential separation between classical learning and robust learning in the statistical query model or under a cryptographic assumption. It suggests that adversarial examples may be an unavoidable byproduct of computational limitations of learning algorithms.
POPQORN: Quantifying Robustness of Recurrent Neural Networks
CHING-YUN KO · Zhaoyang Lyu · Tsui-Wei Weng · Luca Daniel · Ngai Wong · Dahua Lin
The vulnerability to adversarial attacks has been a critical issue of deep neural networks. Addressing this issue requires a reliable way to evaluate the robustness of a network. Recently, several methods have been developed to compute robustness certification for neural networks, namely, certified lower bounds of the minimum adversarial perturbation. Such methods, however, were devised for feed-forward networks, e.g. multi-layer perceptron or convolutional networks; while it remains an open problem to certify robustness for recurrent networks, especially LSTM and GRU. For such networks, there exist additional challenges in computing the robustness certification, such as handling the inputs at multiple steps and the interaction between gates and states. In this work, we propose POPCORN (Propagated-output Certified Robustness for RNNs), a general algorithm to certify robustness of RNNs, including vanilla RNNs, LSTMs, and GRUs. We demonstrate its effectiveness for different network architectures and show that the robustness certification on individual steps can lead to new insights.
Using Pre-Training Can Improve Model Robustness and Uncertainty
Dan Hendrycks · Kimin Lee · Mantas Mazeika
Tuning a pre-trained network is commonly thought to improve data efficiency. However, Kaiming He et al. (2018) have called into question the utility of pre-training by showing that training from scratch can often yield similar performance, should the model train long enough. We show that although pre-training may not improve performance on traditional classification metrics, it does provide large benefits to model robustness and uncertainty. Through extensive experiments on label corruption, class imbalance, adversarial examples, out-of-distribution detection, and confidence calibration, we demonstrate large gains from pre-training and complementary effects with task-specific methods. Results include a 30% relative improvement in label noise robustness and a 10% absolute improvement in adversarial robustness on both CIFAR-10 and CIFAR-100. In some cases, using pre-training without task-specific methods surpasses the state-of-the-art, highlighting the importance of using pre-training when evaluating future methods on robustness and uncertainty tasks.
Generalized No Free Lunch Theorem for Adversarial Robustness
Elvis Dohmatob
This manuscript presents some new impossibility results on adversarial robustness in machine learning, a very important yet largely open problem. We show that if conditioned on a class label the data distribution satisfies the $W_2$ Talagrand transportation-cost inequality (for example, this condition is satisfied if the conditional distribution has density which is log-concave; is the uniform measure on a compact Riemannian manifold with positive Ricci curvature, any classifier can be adversarially fooled with high probability once the perturbations are slightly greater than the natural noise level in the problem. We call this result The Strong "No Free Lunch" Theorem as some recent results (Tsipras et al. 2018, Fawzi et al. 2018, etc.) on the subject can be immediately recovered as very particular cases. Our theoretical bounds are demonstrated on both simulated and real data (MNIST). We conclude the manuscript with some speculation on possible future research directions.
PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach
Tsui-Wei Weng · Pin-Yu Chen · Lam Nguyen · Mark Squillante · Akhilan Boopathy · Ivan Oseledets · Luca Daniel
With the prevalence of deep neural networks, quantifying their robustness to adversarial inputs has become an important area of research. However, most of the current research literature merely focuses on the \textit{worst-case} setting that computes certified lower bounds of minimum adversarial distortion when the input perturbations are constrained within an $\ell_p$ ball, thus lacking robustness assessment beyond the certified range. In this paper, we provide a first look at a \textit{probabilistically} certifiable setting where the perturbation can follow a given distributional characterization. We propose a novel framework \proven to \textbf{PRO}babilistically \textbf{VE}rify \textbf{N}eural network's robusntess with statistical guarantees -- i.e., \proven certifies the probability that the classifier's top-1 prediction cannot be altered under any constrained $\ell_p$ norm perturbation to a given input. Notably, \proven is derived from closed-form analysis of current state-of-the-art worst-case neural network robustness verification frameworks, and therefore it can provide probabilistic certificates with little computational overhead on top of existing methods such as Fast-Lin, CROWN and CNN-Cert. Experiments on small and large MNIST and CIFAR neural network models demonstrate our probabilistic approach can tighten up to around $1.8 \times$ and $3.5 \times$ in the robustness certification with at least a $99.99\%$ confidence compared with the worst-case robustness certificate delivered by CROWN and CNN-Cert.