Timezone: »

 
Workshop
A Blessing in Disguise: The Prospects and Perils of Adversarial Machine Learning
Hang Su · Yinpeng Dong · Tianyu Pang · Eric Wong · Zico Kolter · Shuo Feng · Bo Li · Henry Liu · Dan Hendrycks · Francesco Croce · Leslie Rice · Tian Tian

Sat Jul 24 04:45 AM -- 02:35 PM (PDT) @ None
Event URL: https://advml-workshop.github.io/icml2021/ »

Adversarial machine learning is a new gamut of technologies that aim to study the vulnerabilities of ML approaches and detect malicious behaviors in adversarial settings. The adversarial agents can deceive an ML classifier by significantly altering its response with imperceptible perturbations to the inputs. Although it is not to be alarmist, researchers in machine learning are responsible for preempting attacks and building safeguards, especially when the task is critical for information security and human lives. We need to deepen our understanding of machine learning in adversarial environments.

While the negative implications of this nascent technology have been widely discussed, researchers in machine learning are yet to explore their positive opportunities in numerous aspects. The positive impacts of adversarial machine learning are not limited to boost the robustness of ML models but cut across several other domains.

Since there are both positive and negative applications of adversarial machine learning, tackling adversarial learning to its use in the right direction requires a framework to embrace the positives. This workshop aims to bring together researchers and practitioners from various communities (e.g., machine learning, computer security, data privacy, and ethics) to synthesize promising ideas and research directions and foster and strengthen cross-community collaborations on both theoretical studies and practical applications. Different from the previous workshops on adversarial machine learning, our proposed workshop seeks to explore the prospects besides reducing the unintended risks for sophisticated ML models.

This is a one-day workshop, planned with a 10-minute opening, 11 invited keynotes, about 9 contributed talks, 2 poster sessions, and 2 special sessions for panel discussion about the prospects and perils of adversarial machine learning.

The workshop is kindly sponsored by RealAI Inc. and Bosch.

Sat 4:45 a.m. - 5:00 a.m.
  

Opening remarks for the workshop

Hang Su
Sat 5:00 a.m. - 5:30 a.m.
  
Towards Certifying $\ell_\infty$ Robustness using Neural Networks with $\ell_\infty$-dist Neurons
Liwei Wang
Sat 5:30 a.m. - 6:00 a.m.
  

A Perspective on Adversarial Robustness

Sven Gowal
Sat 6:00 a.m. - 6:05 a.m.
  

Oral presentation of the paper 'Defending against Model Stealing via Verifying External Features'

Yiming Li
Sat 6:05 a.m. - 6:10 a.m.
  

Oral presentation of the paper 'Data Poisoning Won’t Save You From Facial Recognition'

Evani Radiya-Dixit
Sat 6:10 a.m. - 6:40 a.m.
  

Adversarial Robustness: Evaluation and Approaches beyond Adversarial Training

Matthias Hein
Sat 6:40 a.m. - 7:10 a.m.
  

Robustness: Data and Features

Aleksander Madry
Sat 7:10 a.m. - 7:15 a.m.
  

Oral presentation of the paper 'Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints'

Maura Pintor
Sat 7:15 a.m. - 7:20 a.m.
  

Oral presentation of the paper 'Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them'

Florian Tramer
Sat 7:20 a.m. - 7:50 a.m.
  

The Adversarial Patch Threat Model

Jan Hendrik Metzen
Sat 7:50 a.m. - 8:30 a.m.
Discussion Panel #1 (Discussion Panel)
Hang Su, Matthias Hein, Liwei Wang, Sven Gowal, Jan Hendrik Metzen, Henry Liu, Yisen Wang
Sat 8:30 a.m. - 9:30 a.m.

Please join us in GatherTown for our poster session.

We have three rooms:

  1. https://eventhosts.gather.town/DHA3VHxoJBrP8LyE/advml-poster-room-1
  2. https://eventhosts.gather.town/t8Hb764sqooBZIxu/advml-poster-room-2
  3. https://eventhosts.gather.town/DcPdYHkDQmRNEPH2/advml-poster-room-3
Sat 9:30 a.m. - 10:00 a.m.
  

Safety Assessment of Autonomous Vehicles with a Naturalistic and Adversarial Driving Environment

Henry Liu
Sat 10:00 a.m. - 10:30 a.m.
  

Large Underspecified Models: Less Secure, Less Private

Nicholas Carlini
Sat 10:30 a.m. - 10:35 a.m.
  

Oral presentation of the paper 'Certified robustness against adversarial patch attacks via randomized cropping'

Wan-Yi Lin
Sat 10:35 a.m. - 10:40 a.m.
  

Oral presentation of the paper 'Consistency Regularization for Adversarial Robustness'

Jihoon Tack
Sat 10:40 a.m. - 11:10 a.m.
  

Biologically-inspired Defenses against Adversarial Attacks

Andy Banburski
Sat 11:10 a.m. - 11:40 a.m.
  

Adversarial Examples and OOD Generalization

Kamalika Chaudhuri
Sat 11:40 a.m. - 11:45 a.m.
  

Oral presentation of the paper 'Helper-based Adversarial Training: Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off'

Rahul Rade
Sat 11:45 a.m. - 11:50 a.m.
  

Oral presentation of the paper 'Adversarial Robustness of Streaming Algorithms through Importance Sampling'

Sandeep Silwal
Sat 11:50 a.m. - 12:20 p.m.
  

Adversarial Examples IMPROVE Image Recognition

Cihang Xie
Sat 12:20 p.m. - 12:50 p.m.
  

Adversarial Images for the Primate Brain

Will Xiao
Sat 12:50 p.m. - 1:30 p.m.
Discussion Panel #2 (Discussion Panel)
Bo Li, Nicholas Carlini, Andy Banburski, Kamalika Chaudhuri, Will Xiao, Cihang Xie
Sat 1:30 p.m. - 1:35 p.m.
  

Oral presentation of the paper 'Is It Time to Redefine the Classification Task for Deep Learning Systems?'

Keji Han
Sat 1:35 p.m. - 2:35 p.m.

Please join us in GatherTown for our poster session.

We have three rooms:

  1. https://eventhosts.gather.town/DHA3VHxoJBrP8LyE/advml-poster-room-1
  2. https://eventhosts.gather.town/t8Hb764sqooBZIxu/advml-poster-room-2
  3. https://eventhosts.gather.town/DcPdYHkDQmRNEPH2/advml-poster-room-3
-
[ Visit Poster at Spot D1 in Virtual World ]

Understanding the actions of both humans and artificial intelligence (AI) agents is important before modern AI systems can be fully integrated into our daily life. In this paper, we show that, despite their current huge success, deep learning based AI systems can be easily fooled by subtle adversarial noise to misinterpret the intention of an action in interaction scenarios. Based on a case study of skeleton-based human interactions, we propose a novel adversarial attack on interactions, and demonstrate how DNN-based interaction models can be tricked to predict the participants' reactions in unexpected ways. Our study highlights potential risks in the interaction loop with AI and humans, which need to be carefully addressed when deploying AI systems in safety-critical applications.

James Bailey, Yisen Wang, Qiuhong Ke, Xingjun Ma, Nodens Koren
-
[ Visit Poster at Spot D0 in Virtual World ]   
The vulnerability of Deep Neural Networks to Adversarial Attacks has fuelled research towards building robust models. While most existing Adversarial Training algorithms aim towards defending against imperceptible attacks, real-world adversaries are not limited by such constraints. In this work, we aim to achieve adversarial robustness at larger epsilon bounds. We first discuss the ideal goals of an adversarial defense algorithm beyond perceptual limits, and further highlight the shortcomings of naively extending existing training algorithms to higher perturbation bounds. In order to overcome these shortcomings, we propose a novel defense, Oracle-Aligned Adversarial Training (OA-AT), that attempts to align the predictions of the network with that of an Oracle during adversarial training. The proposed approach achieves state-of-the-art performance at large epsilon bounds ($\ell_\infty$ bound of $16/255$) while outperforming adversarial training algorithms such as AWP, TRADES and PGD-AT at standard perturbation bounds ($\ell_\infty$ bound of $8/255$) as well.
Venkatesh Babu Radhakrishnan, Shivangi Khare, Gaurang Sriramanan, Samyak Jain, Sravanti Addepalli
-
[ Visit Poster at Spot C6 in Virtual World ]

We present DeClaW, a system for detecting, classifying, and warning of adversarial inputs presented to a classification neural network. In contrast to current state-of-the-art methods that, given an input, detect whether an input is clean or adversarial, we aim to also identify the types of adversarial attack (e.g., PGD, Carlini-Wagner or clean). To achieve this, we extract statistical profiles, which we term as anomaly feature vectors, from a set of latent features. Preliminary findings suggest that AFVs can help distinguish among several types of adversarial attacks (e.g., PGD versus Carlini-Wagner) with close to 93% accuracy on the CIFAR-10 dataset. The results open the door to using AFV-based methods for exploring not only adversarial attack detection but also classification of the attack type and then design of attack-specific mitigation strategies.

Atul Prakash, Jiguo Song, Sahib Singh, Ryan Feng, Nelson Manohar-Alers
-
[ Visit Poster at Spot C5 in Virtual World ]   

Numerous recent works show that overparameterization implicitly reduces variance, suggesting vanishing benefits for explicit regularization in high dimensions. However, this narrative has been challenged by empirical observations indicating that adversarially trained deep neural networks suffer from robust overfitting. While existing explanations attribute this phenomenon to noise or problematic samples in the training data set, we prove that even on entirely noiseless data, achieving a vanishing adversarial logistic training loss is suboptimal compared to regularized counterparts.

Fanny Yang, Reinhard Heckel, Michael Aerni, Alexandru Tifrea, Konstantin Donhauser
-
[ Visit Poster at Spot C4 in Virtual World ]

Reinforcement learning policies based on deep neural networks are vulnerable to imperceptible adversarial perturbations to their inputs, in much the same way as neural network image classifiers. Recent work has proposed several methods for adversarial training for deep reinforcement learning agents to improve robustness to adversarial perturbations. In this paper, we study the effects of adversarial training on the neural policy learned by the agent. In particular, we compare the Fourier spectrum of minimal perturbations computed for both adversarially trained and vanilla trained neural policies. Via experiments in the OpenAI Atari environments we show that minimal perturbations computed for adversarially trained policies are more focused on lower frequencies in the Fourier domain, indicating a higher sensitivity of these policies to low frequency perturbations. We believe our results can be an initial step towards understanding the relationship between adversarial training and different notions of robustness for neural policies.

Ezgi Korkmaz
-
[ Visit Poster at Spot C3 in Virtual World ]

Adversarial perturbations to state observations can dramatically degrade the performance of deep reinforcement learning policies, and thus raise concerns regarding the robustness of deep reinforcement learning agents. A sizeable body of work has focused on addressing the robustness problem in deep reinforcement learning, and there are several recent proposals for adversarial training methods in the deep reinforcement learning domain. In our work we focus on the robustness of state-of-the-art adversarially trained deep reinforcement learning policies and vanilla trained deep reinforcement learning polices. We propose two novel algorithms to map non-robust features in deep reinforcement learning policies. We conduct several experiments in the Arcade Learning Environment (ALE), and with our proposed feature mapping algorithms we show that while the state-of-the-art adversarial training method eliminates a certain set of non-robust features, a new set of non-robust features more intrinsic to the adversarial training are created. Our results lay out concerns that arise when using existing state-of-the-art adversarial training methods, and we believe our proposed feature mapping algorithm can aid in the process of building more robust deep reinforcement learning policies.

Ezgi Korkmaz
-
[ Visit Poster at Spot C2 in Virtual World ]   

Attacks from adversarial machine learning (ML) have the potential to be used ``for good'': they can be used to run counter to the existing power structures within ML, creating breathing space for those who would otherwise be the targets of surveillance and control. But most research on adversarial ML has not engaged in developing tools for resistance against ML systems. Why? In this paper, we review the broader impact statements that adversarial ML researchers wrote as part of their NeurIPS 2020 papers and assess the assumptions that authors have about the goals of their work. We also collect information about how authors view their work's impact more generally. We find that most adversarial ML researchers at NeurIPS hold two fundamental assumptions that will make it difficult for them to consider socially beneficial uses of attacks: (1) it is desirable to make systems robust, independent of context, and (2) attackers of systems are normatively bad and defenders of systems are normatively good. That is, despite their expressed and supposed neutrality, most adversarial ML researchers believe that the goal of their work is to secure systems, making it difficult to conceptualize and build tools for disrupting the status quo.

Ram Shankar Siva Kumar, Bogdan Kulynych, Maggie Delano, Kendra Albert
-
[ Visit Poster at Spot C1 in Virtual World ]   

This paper examines the robustness of deployed few-shot meta-learning systems when they are fed an imperceptibly perturbed few-shot dataset, showing that the resulting predictions on test inputs can become worse than chance. This is achieved by developing a novel attack, Adversarial Support Poisoning or ASP, which crafts a poisoned set of examples. When even a small subset of malicious data points is inserted into the support set of a meta-learner, accuracy is significantly reduced. We evaluate the new attack on a variety of few-shot classification algorithms and scenarios, and propose a form of adversarial training that significantly improves robustness against both poisoning and evasion attacks.

Richard E Turner, John Bronskill, Elre Oldewage
-
[ Visit Poster at Spot C0 in Virtual World ]

Recent black-box adversarial attacks may struggle to balance their attack ability and visual quality of the generated adversarial examples (AEs) in tackling high-resolution images. In this paper, We propose an attention-guided black-box adversarial attack based on the large-scale multiobjective evolutionary optimization, termed as LMOA. By considering the spatial semantic information of images, we firstly take advantage of the attention map to determine the perturbed pixels. Then, a large-scale multiobjective evolutionary algorithm is employed to traverse the reduced pixels in the salient region. Extensive experimental results have verified the effectiveness of the proposed LMOA on the ImageNet dataset.

Yang Du, Jing Jiang, Zhaoxia Yin, Jie Wang
-
[ Visit Poster at Spot B6 in Virtual World ]

Meta-learning model can quickly adapt to new tasks using few-shot labeled data. However, despite achieving good generalization on few-shot classification tasks, it is still challenging to improve the adversarial robustness of the meta-learning model in few-shot learning. Although adversarial training (AT) methods such as Adversarial Query (AQ) can improve the adversarially robust performance of meta-learning models, AT is still computationally expensive training. On the other hand, meta-learning models trained with AT will drop significant accuracy on the original clean images. This paper proposed a meta-learning method on the adversarially robust neural network called Long-term Cross Adversarial Training (LCAT). LCAT will update meta-learning model parameters cross along the natural and adversarial sample distribution direction with long-term to improve both adversarial and clean few-shot classification accuracy. Due to cross-adversarial training, LCAT only needs half of the adversarial training epoch than AQ, resulting in a low adversarial training computation. Experiment results show that LCAT achieves superior performance both on the clean and adversarial few-shot classification accuracy than SOTA adversarial training methods for meta-learning models.

Bin Xiao, Xuelong Dai, Shuyu Zhao, FAN LIU
-
[ Visit Poster at Spot B5 in Virtual World ]

Data poisoning has been proposed as a compelling defense against facial recognition models trained on Web-scraped pictures. By perturbing the images they post online, users can fool models into misclassifying future (unperturbed) pictures.

We demonstrate that this strategy provides a false sense of security, as it ignores an inherent asymmetry between the parties: users' pictures are perturbed once and for all before being published and scraped, and must thereafter fool all future models---including models trained adaptively against the users' past attacks, or models that use technologies discovered after the attack.

We evaluate two poisoning attacks against large-scale facial recognition, Fawkes 500,000+ downloads) and LowKey. We demonstrate how an ``oblivious'' model trainer can simply wait for future developments in computer vision to nullify the protection of pictures collected in the past. We further show that an adversary with black-box access to the attack can train a robust model that resists the perturbations of collected pictures.

We caution that facial recognition poisoning will not admit an ''arms race'' between attackers and defenders. Once perturbed pictures are scraped, the attack cannot be changed so any future defense irrevocably undermines users' privacy.

Florian Tramer, Evani Radiya-Dixit
-
[ Visit Poster at Spot B4 in Virtual World ]
Making classifiers robust to adversarial examples is hard. Thus, many defenses tackle the seemingly easier task of \emph{detecting} perturbed inputs. We show a barrier towards this goal. We prove a general \emph{hardness reduction} between detection and classification of adversarial examples: given a robust detector for attacks at distance $\epsilon$ (in some metric), we can build a similarly robust (but inefficient) \emph{classifier} for attacks at distance $\epsilon/2$. Our reduction is computationally inefficient, and thus cannot be used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated. %(indeed, building inefficient robust classifiers is also presumed to be very challenging). To illustrate, we revisit $13$ detector defenses. For $10/13$ cases, we show that the claimed detection results would imply an inefficient classifier with robustness far beyond the state-of-the-art.
Florian Tramer
-
[ Visit Poster at Spot B3 in Virtual World ]

Adversarial Training (AT) is known as an effective approach to enhance the robustness of deep neural networks. Recently researchers notice that robust models with AT have good generative ability and can synthesize realistic images, while the reason behind it is yet under-explored. In this paper, we demystify this phenomenon by developing a unified probabilistic framework, called Contrastive Energy-based Models (CEM). On the one hand, we provide the first probabilistic characterization of AT through a unified understanding of robustness and generative ability. On the other hand, our CEM can also naturally generalize AT to the unsupervised scenario and develop principled unsupervised AT methods. Based on these, we propose principled adversarial sampling algorithms in both supervised and unsupervised scenarios. Experiments show our sampling algorithms significantly improve the sampling quality and achieves an Inception score of 9.61 on CIFAR-10, which is superior to previous energy-based models and comparable to state-of-the-art generative models.

Zhouchen Lin, Jiansheng Yang, Yisen Wang, Yifei Wang
-
[ Visit Poster at Spot B2 in Virtual World ]

Pre-trained models (PTMs) have been widely used in various downstream tasks. The parameters of PTMs are distributed on the Internet and may suffer backdoor attacks. In this work, we demonstrate the universal vulnerability of PTMs, where fine-tuned PTMs can be easily controlled by backdoor attacks in arbitrary downstream tasks. Specifically, attackers can add a simple pre-training task, which restricts the output representations of trigger instances to pre-defined vectors, namely neuron-level backdoor attack (NeuBA). If the backdoor functionality is not eliminated during fine-tuning, the triggers can make the fine-tuned model predict fixed labels by pre-defined vectors. In the experiments of both natural language processing (NLP) and computer vision (CV), we show that NeuBA absolutely controls the predictions for trigger instances without any knowledge of downstream tasks. Finally, we apply several defense methods to NeuBA and find that model pruning is a promising direction to resist NeuBA by excluding backdoored neurons. Our findings sound a red alarm for the wide use of PTMs.

Maosong Sun, Xin Jiang, Yasheng Wang, Zhiyuan Liu, Fanchao Qi, Tian Lv, Yongwei Li, Guangxuan Xiao, Zhengyan Zhang
-
[ Visit Poster at Spot B1 in Virtual World ]   

We propose a novel and effective input transformation based adversarial defense method against gray- and black-box attack, which is computationally efficient and does not require any adversarial training or retraining of a classification model. We first show that a very simple iterative Gaussian smoothing can effectively wash out adversarial noise and achieve substantially high robust accuracy. Based on the observation, we propose Self-Supervised Iterative Contextual Smoothing (SSICS), which aims to reconstruct the original discriminative features from the Gaussian-smoothed image in context-adaptive manner, while still smoothing out the adversarial noise. From the experiments on ImageNet, we show that our SSICS achieves both high standard accuracy and very competitive robust accuracy for the gray- and black-box attacks; e.g., transfer-based PGD-attack and score-based attack. A noteworthy point to stress is that our defense is free of computationally expensive adversarial training, yet, can approach its robust accuracy via input transformation.

Taesup Moon, YoungJoon Yoo, Naeun Ko, Sungmin Cha
-
[ Visit Poster at Spot B0 in Virtual World ]

Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty caused by stochastic policies and environment variability. To address this issue, we propose a novel reinforcement learning framework of CVaR-Proximal-Policy-Optimization (CPPO) by rating the conditional value-at-risk (CVaR) as an assessment for risk. We show that performance degradation under observation state disturbance and transition probability disturbance theoretically depends on the range of disturbance as well as the gap of value function between different states. Therefore, constraining the value function among states with CVaR can improve the robustness of the policy. Experimental results show that CPPO achieves higher cumulative reward and exhibits stronger robustness against observation state disturbance and transition probability disturbance in environment dynamics among a series of continuous control tasks in MuJoCo.

Jun Zhu, Dong Yan, Lemon Zhou, Chengyang Ying
-
[ Visit Poster at Spot A6 in Virtual World ]   
While online video sharing becomes more popular, it also causes unconscious leakage of personal information in the video retrieval systems like deep hashing. An adversary can collect users' private information from the video database by querying similar videos. This paper focuses on bypassing the deep video hashing based retrieval to prevent information from being maliciously collected. We propose $universal \ adversarial \ head$ (UAH), which crafts adversarial query videos by prepending the original videos with a sequence of adversarial frames to perturb the normal hash codes in the Hamming space. This adversarial head can be obtained just using a few videos, and mislead the retrieval system to return irrelevant videos on most natural query videos. Furthermore, to obey the principle of information protection, we expand the proposed method to a data-free paradigm to generate the UAH, without access to users' original videos. Extensive experiments demonstrate the protection effectiveness of our method under various settings.
Shutao Xia, Chaoning Zhang, Dongxian Wu, Bin Chen, Jiawang Bai
-
[ Visit Poster at Spot A5 in Virtual World ]

The vulnerability of the Lottery Ticket Hypothesis has not been studied from the purview of Membership Inference Attacks. Through this work, we are the first to empirically show that the lottery ticketed networks are equally vulnerable to membership inference attacks. A Membership Inference Attack (MIA) is the process of determining whether a data sample belongs to a training set of a trained model or not. Membership Inference Attacks could leak critical information about the training data that can be used for targeted attacks. Recent deep learning models often have very large memory footprints and a high computational cost associated with training and drawing inferences. Lottery Ticket Hypothesis is used to prune the networks to find smaller sub-networks that at least match the performance of the original model in terms of test accuracy in a similar number of iterations. We used CIFAR-10, CIFAR-100, and ImageNet datasets to perform image classification tasks and observe that the attack accuracies are similar. We also see that the attack accuracy varies directly according to the number of classes in the dataset and the sparsity of the network. We demonstrate that these attacks are transferable across models with high accuracy.

Amol Deshpande, Shruti Bidwalkar, Shishira Maiya, Aadesh Bagmar
-
[ Visit Poster at Spot A4 in Virtual World ]   
Adversarial training tends to result in models that are less accurate on natural (unperturbed) examples compared to standard models. This can be attributed to either an algorithmic shortcoming or a fundamental property of the training data distribution, which admits different solutions for optimal standard and adversarial classifiers. In this work, we focus on the latter case under a binary Gaussian mixture classification problem. Unlike earlier work, we aim to derive the natural accuracy gap between the optimal Bayes and adversarial classifiers, and study the effect of different distributional parameters, namely separation between class centroids, class proportions, and the covariance matrix, on the derived gap. We show that under certain conditions, the natural error of the optimal adversarial classifier, as well as the gap, are locally minimized when classes are balanced, contradicting the performance of the Bayes classifier where perfect balance induces the worst accuracy. Moreover, we show that with an $\ell_\infty$ bounded perturbation and an adversarial budget of $\epsilon$, this gap is $\Theta(\epsilon^2)$ for the worst-case parameters, which for suitably small $\epsilon$ indicates the theoretical possibility of achieving robust classifiers with near-perfect accuracy, which is rarely reflected in practical algorithms.
Mohammad H Rohban, Amir Abouei, Seyed Alireza Mousavi Hosseini
-
[ Visit Poster at Spot A3 in Virtual World ]   

An adversary wants to attack a limited number of images within a stream of known length to reduce the exposure risk. Also, the adversary wants to maximize the success rate of the performed attacks. We show that with very minimal changes in images data, majority of attacking attempt would fail, however some attempts still lead to succeed. We detail an algorithm that choose the optimal images which lead to successful attack . We apply our approach on MNIST and prove it’s significant outcome compared to the state of the art.

Maryam Amirmazlaghani, khalooei Khalooei, Hossein Mohasel Arjomandi
-
[ Visit Poster at Spot A2 in Virtual World ]

When data is publicly released for human consumption, it is unclear how to prevent its unauthorized usage for machine learning purposes. Successful model training may be preventable with carefully designed dataset modifications, and we present a proof-of-concept approach for the image classification setting. We propose methods based on the notion of adversarial shortcuts, which encourage models to rely on non-robust signals rather than semantic features, and our experiments demonstrate that these measures successfully prevent deep learning models from achieving high accuracy on real, unmodified data examples

Tadayoshi Kohno, Aditya Kusupati, Ian Covert, Ivan Evtimov
-
[ Visit Poster at Spot A1 in Virtual World ]

Most pre-trained classifiers, though they may work extremely well on the domain they were trained upon, are not trained in a robust fashion, and therefore are sensitive to adversarial attacks. A recent technique, denoised-smoothing, demonstrated that it was possible to create certifiably robust classifiers from a pre-trained classifier (without any retraining) by pre-pending a denoising network and wrapping the entire pipeline within randomized smoothing. However, this is a costly procedure, which requires multiple queries due to the randomized smoothing element, and which ultimately is very dependent on the quality of the denoiser. In this paper, we demonstrate that a more conventional “adversarial training” approach also works when applied to this robustification process. Specifically, we show that by training an image-to-image translation model, prepended to a pre-trained classifier, with losses that optimize for both the fidelity of the image reconstruction and the adversarial performance of the end-to-end system, we can robustify pre-trained classifiers to a higher empirical degree of accuracy than denoised smoothing. Further, these robustifers are also transferable to some degree across multiple classifiers and even some architectures, illustrating that in some real sense they are removing the “adversarial manifold” from the input data, a task that has traditionally been very challenging for “conventional” preprocessing methods.

Zico Kolter, Filipe Condessa, Huan Zhang, Leslie Rice, Leonid Boytsov, Wan-Yi Lin, Mohammad Sadegh Norouzzadeh
-
[ Visit Poster at Spot A0 in Virtual World ]

A common observation regarding adversarial attacks is that they mostly give rise to false activation at the penultimate layer to fool the classifier. Assuming that these activation values correspond to certain features of the input, the objective becomes choosing the features that are most useful for classification. Hence, we propose a novel approach to identify the important features by employing counter-adversarial attacks, which highlights the consistency at the penultimate layer with respect to perturbations on input samples. First, we empirically show that there exist a subset of features, classification based in which bridge the gap between the clean and robust accuracy. Second, we propose a simple yet efficient mechanism to identify those features by searching the neighborhood of input sample. We then select features by observing the consistency of the activation values at the penultimate layer.

Deniz Gunduz, Kerem Ozfatura, Muhammad Zaid Hameed, Emre Ozfatura
-
[ Visit Poster at Spot D1 in Virtual World ]

While adversarial training has become the de facto approach for training robust classifiers, it leads to a drop in accuracy. This has led to prior works postulating that accuracy is inherently at odds with robustness. Yet, the phenomenon remains inexplicable. In this paper, we closely examine the changes induced in the decision boundary of a deep network during adversarial training. We find that adversarial training leads to unwarranted increase in the margin along certain adversarial directions, thereby hurting accuracy. Motivated by this observation, we present a novel algorithm, called Helper-based Adversarial Training (HAT), to reduce this effect by incorporating additional wrongly labelled examples during training. Our proposed method provides a notable improvement in accuracy without compromising robustness. It achieves a better trade-off between accuracy and robustness in comparison to existing defenses.

Seyed Moosavi, Rahul Rade
-
[ Visit Poster at Spot D0 in Virtual World ]   

We propose an AID-purifier that can boost the robustness of adversarially-trained networks by purifying their inputs. AID-purifier is an auxiliary network that works as an add-on to an already trained main classifier. To keep it computationally light, it is trained as a discriminator with a binary cross-entropy loss. To obtain additionally useful information from the adversarial examples, the architecture design is closely related to information maximization principles where two layers of the main classification network are piped to the auxiliary network. To assist the iterative optimization procedure of purification, the auxiliary network is trained with AVmixup. AID-purifier can be used together with other purifiers such as PixelDefend for an extra enhancement. The overall results indicate that the best performing adversarially-trained networks can be enhanced by the best performing purification networks, where AID-purifier is a competitive candidate that is light and robust.

Wonjong Rhee, Eunjung Lee, Duhun Hwang
-
[ Visit Poster at Spot C6 in Virtual World ]   

Many works have demonstrated that deep neural networks (DNNs) are vulnerable to adversarial examples. A deep learning system involves a couple of elements: the learning task, data set, deep model, loss, and optimizer. Each element may cause the vulnerability of the deep learning system, and simply attributing the vulnerability of the deep learning system to the deep model may impede addressing the adversarial attack. So we redefine the robustness of DNNs as the robustness of the deep neural learning system, and we experimentally find that the vulnerability of the deep learning system also roots in the learning task itself. In detail, this paper defines the interval-label classification task for the deep classification system, whose labels are predefined non-overlapping intervals instead of a fixed value (hard label) or probability vector (soft label). The experimental results demonstrate that the interval-label classification task is more robust than the traditional classification task while retaining accuracy.

Songcan Chen, Yun Li, Keji Han
-
[ Visit Poster at Spot C5 in Virtual World ]
Transfer-based adversarial attacks can effectively evaluate model robustness in the black-box setting. Though several methods have demonstrated impressive transferability of untargeted adversarial examples, targeted adversarial transferability is still challenging. In this paper, we develop a simple yet practical framework to efficiently craft targeted transfer-based adversarial examples. Specifically, we propose a conditional generative attacking model, which can generate the adversarial examples targeted at different classes by simply altering the class embedding and share a single backbone. Extensive experiments demonstrate that our method improves the success rates of targeted black-box attacks by a significant margin over the existing methods --- it reaches an average success rate of 29.6\% against six diverse models based only on one substitute white-box model in the standard testing of NeurIPS 2017 competition, which outperforms the state-of-the-art gradient-based attack methods (with an average success rate of $<$2\%) by a large margin. Moreover, the proposed method is also more efficient beyond an order of magnitude than gradient-based methods.
Tianyu Pang, Yinpeng Dong, Xiao Yang
-
[ Visit Poster at Spot C4 in Virtual World ]

Deep reinforcement learning (DRL) policies are vulnerable to the adversarial attack on their observations, which may mislead real-world RL agents to catastrophic failures. Several works have shown the effectiveness of this type of adversarial attacks. But these adversaries are inclined to be detected because these adversaries do not inhibit their attacks activity. Recent works provide heuristic methods by attacking the victim agent at a small subset of time steps, but it aims at lack for theoretical principles. Inspired by the idea that adversarial attacks at each time step have different efforts, we denote a novel strategically-timed attack called Tentative Frame Attack for continuous control environments. We further propose a theoretical framework of finding optimal frame attack. Following this framework, we trained the frame attack strategy online with the victim agents and a fixed adversary. The empirical results show that our adversaries achieve the state-of-the-art performance on DRL agents which outperforms the full-timed attack.

Jun Zhu, Chengyang Ying, Lemon Zhou, You Qiaoben
-
[ Visit Poster at Spot C3 in Virtual World ]

Recently, adversarial-examples-based audio steganography was proposed, which hid the messages by generating audio adversarial examples whose target phrases are the hidden messages. However, the embedding operation causes noticeable distortion, and the form of the hidden message must be a sentence or a phrase, which limits the application of steganography. In this paper, we propose a novel steganography based on the hidden adversarial example (HAE) that is similar to a normal input but will get the hidden-message-encoded logits after passing through the neural network. In the HAE-based steganography, the message is embedded by adding slight noise to audios to modify the maximum logit of each frame to particular intervals. The experimental results show that the stego audios generated by HAE-based steganography are more concealed and own better speech quality.

Nenghai Yu, Kejiang Chen, Weiming Zhang, Haozhe Chen
-
[ Visit Poster at Spot C2 in Virtual World ]   

Traditional adversarial examples are typically generated by adding perturbation noise to the input image within a small matrix norm. In practice, unrestricted adversarial attack has raised great concern and presented a new threat to the AI safety. In this paper, we propose a wavelet-VAE structure to reconstruct an input image and generate adversarial examples by modifying the latent code. Different from perturbation-based attack, the modifications of the proposed method are not limited but imperceptible to human eyes. Experiments show that our method can generate high quality adversarial examples on ImageNet dataset.

Shibao Zheng, Chang Liu, Wenzhao Xiang
-
[ Visit Poster at Spot C1 in Virtual World ]   

Object detection methods based on deep neural networks are vulnerable to adversarial examples. The existing attack methods have the following problems: 1) the training generator takes a long time and is difficult to extend to a large dataset; 2) the excessive destruction of the image features does not improve the black-box attack effect(the generated adversarial examples have poor transferability) and brings about visible perturbations. In response to these problems, we proposed a more imperceptible attack(MI attack) with a stopping condition of feature destruction and a noise cancellation mechanism. Finally, the generator generates subtle adversarial perturbations, which can not only attack the object detection models that are based on proposal and regression but also boost the training speed by 4-6 times. Experiments show that the MI method has achieved state-of-the-art attack performance in the large datasets PASCAL VOC.

Xiaochun Cao, Xingxing Wei, Siyuan Liang
-
[ Visit Poster at Spot C0 in Virtual World ]

Deep learning models have become a popular choice for medical image analysis. However, the poor generalization performance of deep learning models limits them from being deployed in the real world as robustness is critical for medical applications. For instance, the state-of-the-art Convolutional Neural Networks (CNNs) fail to detect samples drawn statistically far away from the training distribution or adversarially. In this work, we experimentally evaluate the robustness of a Mahalanobis distance-based confidence score, a simple yet effective method for detecting abnormal input samples, in classifying malaria parasitized cells and uninfected cells. Results indicated that the Mahalanobis confidence score detector exhibits improved performance and robustness of deep learning models, and achieves state-of-the-art performance on both out-of-distribution and adversarial samples.

Ransalu Senanayake, Anisie Uwimana
-
[ Visit Poster at Spot B6 in Virtual World ]   

Evaluating robustness of machine-learning models to adversarial examples is a challenging problem. Many defenses have been shown to provide a false sense of security by causing gradient-based attacks to fail, and they have been broken under more rigorous evaluations. Although guidelines and best practices have been suggested to improve current adversarial robustness evaluations, the lack of automatic testing and debugging tools makes it difficult to apply these recommendations in a systematic manner. In this work, we overcome these limitations by (i) defining a set of quantitative indicators which unveil common failures in the optimization of gradient-based attacks, and (ii) proposing specific mitigation strategies within a systematic evaluation protocol. Our extensive experimental analysis shows that the proposed indicators of failure can be used to visualize, debug and improve current adversarial robustness evaluations, providing a first concrete step towards automatizing and systematizing current adversarial robustness evaluations.

Fabio Roli, Battista Biggio, Nicholas Carlini, Ambra Demontis, Giovanni Manca, Angelo Sotgiu, Luca Demetrio, Maura Pintor
-
[ Visit Poster at Spot B5 in Virtual World ]

This paper proposes a certifiable defense against adversarial patch attacks on image classification. Our approach classifies random crops from the original image independently and classifies the original image as the majority vote over predicted classes of the crops. Leveraging the fact that a patch attack can only influence a certain number of pixels in the image, we derive certified robustness bounds for the classifier. Our method is particularly effective when realistic transformations are applied to the adversarial patch, such as affine transformations. Such transformations occur naturally when an adversarial patch is physically introduced in a scene. Our method improves upon the current state of the art in defending against patch attacks on CIFAR10 and ImageNet, both in terms of certified accuracy and inference time.

Zico Kolter, Leslie Rice, jinghao shi, Fatemeh Sheikholeslami, Wan-Yi Lin
-
[ Visit Poster at Spot B4 in Virtual World ]   
Model robustness against adversarial examples has been widely studied, yet the lack of generalization to more realistic scenarios can be challenging. Specifically, recent works using adversarial training can successfully improve model robustness, but these works primarily consider adversarial threat models limited to $\ell_{p}$-norm bounded perturbations and might overlook semantic perturbations and their composition. In this paper, we firstly propose a novel method for generating composite adversarial examples. By utilizing component-wise PGD update and automatic attack- order scheduling, our method can find the optimal attack composition. We then propose generalized adversarial training (GAT) to extend model robustness from $\ell_{p}$ norm to composite semantic perturbations, such as Hue, Saturation, Brightness, Contrast, and Rotation. The results show that GAT can be robust not only on any single attack but also on combination of multiple attacks. GAT also outperforms baseline adversarial training approaches by a significant margin.
Tsung-Yi Ho, Pin-Yu Chen, Lei Hsiung, Yun-Yun Tsai
-
[ Visit Poster at Spot B3 in Virtual World ]

In the adversarial streaming model, an adversary gives an algorithm a sequence of adaptively chosen updates as a data stream and the goal of the algorithm is to compute or approximate some predetermined function for every prefix of the adversarial stream. However, the adversary may generate future updates based on previous outputs of the algorithm and in particular, the adversary may gradually learn the random bits internally used by an algorithm to manipulate dependencies in the input. This is especially problematic as many important problems in the streaming model require randomized algorithms, as they are known to not admit any deterministic algorithms that use sublinear space. In this paper, we introduce adversarially robust streaming algorithms for central machine learning and algorithmic tasks, such as regression and clustering, as well as their more general counterparts, subspace embedding, low-rank approximation, and coreset construction. Our results are based on a simple, but powerful, observation that many importance sampling-based algorithms give rise to adversarial robustness in contrast to sketching based algorithms, which are very prevalent in the streaming literature but suffer from adversarial attacks. In addition, we show that the well-known merge and reduce paradigm used for corset construction in streaming is adversarially robust. To the best of our knowledge, these are the first adversarially robust results for these problems yet require no new algorithmic implementations. Finally, we empirically confirm the robustness of our algorithms on various adversarial attacks and demonstrate that by contrast, some common existing algorithms are not robust.

Samson Zhou, Sandeep Silwal, Mariano Schain, Yossi Matias, Avinatan Hasidim, Vladimir Braverman
-
[ Visit Poster at Spot B2 in Virtual World ]
Randomized smoothing has achieved state-of-the-art certified robustness against $l_2$-norm adversarial attacks. However, it is not wholly resolved on how to find the optimal base classifier for randomized smoothing. In this work, we employ a Smoothed WEighted ENsembling (SWEEN) scheme to improve the performance of randomized smoothed classifiers. We show the ensembling generality that SWEEN can help achieve optimal certified robustness. Furthermore, theoretical analysis proves that the optimal SWEEN model can be obtained from training under mild assumptions. We also develop an adaptive prediction algorithm to reduce the prediction and certification cost of SWEEN models. Extensive experiments show that SWEEN models outperform the upper envelope of their corresponding candidate models by a large margin. Moreover, SWEEN models constructed using a few small models can achieve comparable performance to a single large model with a notable reduction in training time.
Bin Dong, Ranran Wang, Yunzhen Feng, Chizhou Liu
-
[ Visit Poster at Spot B1 in Virtual World ]

Graph neural networks have been shown to be vulnerable to adversarial attacks. While the majority of the literature focuses on such vulnerability in node-level classification tasks, little effort has been dedicated to attacks on graph-level classification, an important problem with numerous real-life applications such as biochemistry and social network analysis. The few existing methods often require unrealistic setups, such as access to internal information of the victim models, or an impractically-large number of queries. We present a novel Bayesian optimisation-based attack method for graph classification models. Our method is black-box, query-efficient and parsimonious with respect to the perturbation applied. We empirically validate the effectiveness and flexibility of the proposed method and analyse patterns behind the adversarial samples produced, which may shed further light on the adversarial robustness of graph classification models.

Xiaowen Dong, Michael A Osborne, Arno Blaas, Robin Ru, Henry Kenlay, Xingchen Wan
-
[ Visit Poster at Spot B0 in Virtual World ]
Evaluating adversarial robustness amounts to finding the minimum perturbation needed to have an input sample misclassified. The inherent complexity of the underlying optimization requires current gradient-based attacks to be carefully tuned, initialized, and possibly executed for many computationally-demanding iterations, even if specialized to a given perturbation model. In this work, we overcome these limitations by proposing a fast minimum-norm (FMN) attack that works with different $\ell_p$-norm perturbation models ($p=0, 1, 2, \infty$), is robust to hyperparameter choices, does not require adversarial starting points, and converges within few lightweight steps. It works by iteratively finding the sample misclassified with maximum confidence within an $\ell_p$-norm constraint of size $\epsilon$, while adapting $\epsilon$ to minimize the distance of the current sample to the decision boundary. Extensive experiments show that FMN significantly outperforms existing attacks in terms of convergence speed and computation time, while reporting comparable or even smaller perturbation sizes.
Battista Biggio, Wieland Brendel, Fabio Roli, Maura Pintor
-
[ Visit Poster at Spot A6 in Virtual World ]   

Windows malware classifiers that rely on static analysis have been proven vulnerable to adversarial EXEmples, i.e., malware samples carefully manipulated to evade detection. However, such attacks are typically optimized via query-inefficient algorithms that iteratively apply random manipulations on the input malware, and require checking that the malicious functionality is preserved after manipulation through computationally-expensive validations. To overcome these limitations, we propose RAMEn, a general framework for creating adversarial EXEmples via functionality-preserving manipulations. RAMEn optimizes their parameters of such manipulations via gradient-based (white-box) and gradient-free (black-box) attacks, implementing many state-of-the-art attacks for crafting adversarial Windows malware. It also includes a family of black-box attacks, called GAMMA, which optimize the injection of benign content to facilitate evasion. Our experiments show that gradient-based and gradient-free attacks can bypass malware detectors based on deep learning, non-differentiable models trained on hand-crafted features, and even some renowned commercial products.

Fabio Roli, Alessandro Armando, Giovanni Lagorio, Battista Biggio, Luca Demetrio, Luca Demetrio
-
[ Visit Poster at Spot A5 in Virtual World ]   
We develop a theoretical framework for adversarial training (AT) with FW optimization (FW-AT) that reveals a geometric connection between the loss landscape and the distortion of $\ell_\infty$ FW attacks (the attack's $\ell_2$ norm). Specifically, we show that high distortion of FW attacks is equivalent to low variation along the attack path. It is then experimentally demonstrated on various deep neural network architectures that $\ell_\infty$ attacks against robust models achieve near maximal $\ell_2$ distortion. To demonstrate the utility of our theoretical framework we develop FW-Adapt, a novel adversarial training algorithm which uses simple distortion measure to adapt the number of attack steps during training. FW-Adapt provides strong robustness against white- and black-box attacks at lower training times than PGD-AT.
Jay Roberts, Theodoros Tsiligkaridis
-
[ Visit Poster at Spot A4 in Virtual World ]

Robustness to adversarial attacks is typically obtained through expensive adversarial training with Projected Gradient Descent. We introduce ROPUST, a remarkably simple and efficient method to leverage robust pre-trained models and further increase their robustness, at no cost in natural accuracy. Our technique relies on the use of an Optical Processing Unit (OPU), a photonic co-processor, and a fine-tuning step performed with Direct Feedback Alignment, a synthetic gradient training scheme. We test our method on nine different models against four attacks in RobustBench, consistently improving over state-of-the-art performance. We also introduce phase retrieval attacks, specifically designed to target our own defense. We show that even with state-of-the-art phase retrieval techniques, ROPUST is effective.

Iacopo Poli, Laurent Meunier, Julien Launay, Ruben Ohana, Alessandro Cappelli
-
[ Visit Poster at Spot A3 in Virtual World ]
Randomized smoothing is currently a state-of-the-art method to construct a certifiably robust classifier from neural networks against $\ell_2$-adversarial perturbations. Under the paradigm, the robustness of a classifier is aligned with the prediction confidence, i.e., the higher confidence from a smoothed classifier implies the better robustness. This motivates us to rethink the fundamental trade-off between accuracy and robustness in terms of calibrating confidences of smoothed classifier. In this paper, we propose a simple training scheme, coined SmoothMix, to control the robustness of smoothed classifiers via self-mixup: it trains convex combinations of samples along the direction of adversarial perturbation for each input. The proposed procedure effectively identifies over-confident, near off-class samples as a cause of limited robustness in case of smoothed classifiers, and offers an intuitive way to adaptively set a new decision boundary between these samples for better robustness. Our experiments show that the proposed method can significantly improve the certified $\ell_2$-robustness of smoothed classifiers compared to state-of-the-art robust training methods.
Jinwoo Shin, Doguk Kim, Heung-Chang Lee, Minkyu Kim, Sejun Park, Jongheon Jeong
-
[ Visit Poster at Spot A2 in Virtual World ]

Deep neural networks are vulnerable to small input perturbations known as adversarial attacks. Inspired by the fact that these adversaries are constructed by iteratively minimizing the confidence of a network for the true class label, we propose the anti-adversary layer, aimed at countering this effect. In particular, our layer generates an input perturbation in the opposite direction of the adversarial one and feeds the classifier a perturbed version of the input. Our approach is training-free and theoretically supported. We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models, and conduct large-scale experiments from black-box to adaptive attacks on CIFAR10, CIFAR100 and ImageNet. Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.

Bernard Ghanem, Phil Torr, Adel Bibi, Ali Thabet, Juan C Perez, Motasem Alfarra
-
[ Visit Poster at Spot A1 in Virtual World ]   
Modern object detectors are vulnerable to adversarial examples, which brings potential risks to numerous applications, e.g., self-driving car. Among attacks regularized by $\ell_p$ norm, $\ell_0$-attack aims to modify as few pixels as possible. Nevertheless, the problem is nontrivial since it generally requires to optimize the shape along with the texture simultaneously, which is an NP-hard problem. To address this issue, we propose a novel method of Adversarial Semantic Contour (ASC) guided by object contour as prior. With this prior, we reduce the searching space to accelerate the $\ell_0$ optimization, and also introduce more semantic information which should affect the detectors more. Based on the contour, we optimize the selection of modified pixels via sampling and their colors with gradient descent alternately. Extensive experiments demonstrate that our proposed ASC outperforms the most commonly manually designed patterns (e.g., square patches and grids) on task of disappearing. By modifying no more than 5\% and 3.5\% of the object area respectively, our proposed ASC can successfully mislead the mainstream object detectors including the SSD512, Yolov4, Mask RCNN, Faster RCNN, etc.
Jun Zhu, Xiao Yang, Zijian Zhu, Yichi Zhang
-
[ Visit Poster at Spot A0 in Virtual World ]

Adversarial patches have been of interest to researchers in recent years due to their easy implementation in real world attacks. In this paper we expand upon previous research by demonstrating a new "hidden" patch attack on optical flow. By altering the transparency during training we can generate patches that are invariant to their background meaning they can be inconspicuously applied using a transparent film to any number of objects. This also has the added benefit of reducing training costs when mass producing adversarial objects, since only one trained patch is needed for any application. Although this specific implementation is demonstrated using a white box attack on optical flow, it can be generalized to other scenarios such as object recognition or semantic segmentation.

Benjamin Wortman
-
[ Visit Poster at Spot D1 in Virtual World ]

Deep learning has proven to be a highly effective problem-solving tool for object detection and image segmentation across various domains such as healthcare and autonomous driving. At the heart of this performance lies neural architecture design which relies heavily on domain knowledge and prior experience on the researchers' behalf. More recently, this process of finding the most optimal architectures, given an initial search space of possible operations, was automated by Neural Architecture Search (NAS). In this paper, we evaluate the robustness of one such algorithm known as Efficient NAS (ENAS) against data agnostic poisoning attacks on the original search space with carefully designed ineffective operations. By evaluating algorithm performance on the CIFAR-10 dataset, we empirically demonstrate how our novel search space poisoning (SSP) approach and multiple-instance poisoning attacks exploit design flaws in the ENAS controller to result in inflated prediction error rates for child networks. Our results provide insights into the challenges to surmount in using NAS for more adversarially robust architecture search.

Rohan Jain, Nayan Saxena, Robert Wu
-
[ Visit Poster at Spot D0 in Virtual World ]

In this paper we propose a new family of algorithms, ATENT, for training adversarially robust deep neural networks. We formulate a new loss function that is equipped with an entropic regularization. Our loss considers the contribution of adversarial samples that are drawn from a specially designed distribution that assigns high probability to points with high loss and in the immediate neighborhood of training samples. ATENT achieves competitive (or better) performance in terms of robust classification accuracy as compared to several state-of-the-art robust learning approaches on benchmark datasets such as MNIST and CIFAR-10.

Chinmay Hegde, Siddharth Garg, Animesh Chowdhury, Ameya Joshi, Gauri Jagatap
-
[ Visit Poster at Spot C6 in Virtual World ]   

State-of-the-art (SOTA) methods for certified robust training including interval bound propagation (IBP) and CROWN-IBP usually use a long warmup schedule with hundreds or thousands epochs and are thus costly. In this paper, we identify two important issues, namely exploded bounds at initialization, and the imbalance in ReLU activation states, which make certified training difficult and unstable, and thereby long warmup was previously needed. For fast training with short warmup, we propose three improvements, including a weight initialization for IBP training, fully adding Batch Normalization (BN), and regularization during warmup to tighten certified bounds and balance ReLU activation states. With a short warmup for fast training, we are already able to outperform literature SOTA trained with hundreds or thousands epochs under the same network architecture.

Cho-Jui Hsieh, Jinfeng Yi, Huan Zhang, Yihan Wang, Zhouxing Shi
-
[ Visit Poster at Spot C5 in Virtual World ]

Adversarial training methods, which minimizes the loss of adversarially-perturbed training examples, have been extensively studied as a solution to improve the robustness of the deep neural networks. However, most adversarial training methods treat all training examples equally, while each example may have a different impact on the model's robustness during the course of training. Recent works have exploited such unequal importance of adversarial samples to model's robustness, which has been shown to obtain high robustness against untargeted PGD attacks. However, we empirically observe that they make the feature spaces of adversarial samples across different classes overlap, and thus yield more high-entropy samples whose labels could be easily flipped. This makes them more vulnerable to targeted adversarial perturbations. Moreover, to address such limitations, we propose a simple yet effective weighting scheme, Entropy-Weighted Adversarial Training (EWAT), which weighs the loss for each adversarial training example proportionally to the entropy of its predicted distribution, to focus on examples whose labels are more uncertain. We validate our method on multiple benchmark datasets and show that it achieves an impressive increase of robust accuracy.

Sung Ju Hwang, Jinwoo Shin, Jihoon Tack, Minseon Kim
-
[ Visit Poster at Spot C4 in Virtual World ]

Adversarial training (AT) is currently one of the most successful methods to obtain the adversarial robustness of deep neural networks. However, the phenomenon of robust overfitting, i.e., the robustness starts to decrease significantly during AT, has been problematic, not only making practitioners consider a bag of tricks for a successful training, e.g., early stopping, but also incurring a significant generalization gap in the robustness. In this paper, we propose an effective regularization technique that prevents robust overfitting by optimizing an auxiliary 'consistency' regularization loss during AT. Specifically, it forces the predictive distributions after attacking from two different augmentations of the same instance to be similar with each other. Our experimental results demonstrate that such a simple regularization technique brings significant improvements in the test robust accuracy of a wide range of AT methods. More remarkably, we also show that our method could significantly help the model to generalize its robustness against unseen adversaries, e.g., other types or larger perturbations compared to those used during training.

Jinwoo Shin, Sung Ju Hwang, Minseon Kim, Jongheon Jeong, Sihyun Yu, Jihoon Tack
-
[ Visit Poster at Spot C3 in Virtual World ]   
The modern open internet contains billions of public pictures of human faces across the web, especially on social media websites used by half the world's population. In this context, Face Recognition (FR) systems have the potential to match faces to specific names and identities, creating glaring privacy concerns. Adversarial attacks are a promising way to grant users privacy from FR systems by disrupting their capability to recognize faces. Yet, such attacks can be perceptible to human observers, especially under the more challenging black-box threat model. In the literature, the justification for the imperceptibility of such attacks hinges on bounding metrics such as $\ell_p$ norms. However, there is not much research on how these norms match up with human perception. Through examining and measuring both the effectiveness of recent black-box attacks in the face recognition setting and their corresponding human perceptibility through survey data, we demonstrate the trade-offs in perceptibility that occur as attacks become more aggressive. We also show how the $\ell_2$ norm and other metrics do not correlate with human perceptibility in a linear fashion, thus making these norms suboptimal at measuring adversarial attack perceptibility.
Sarah Bargal, Nataniel Ruiz, Benjamin Spetter-Goldstein
-
[ Visit Poster at Spot C2 in Virtual World ]   

Achieving transferability of targeted attacks is reputed to be remarkably difficult, and state-of-the-art approaches are resource-intensive due to training target-specific model(s) with additional data. In our work, we find, however, that simple transferable attacks which require neither additional data nor model training can achieve surprisingly high targeted transferability. This insight has been overlooked mainly due to the widespread practice of unreasonably restricting attack optimization to few iterations. In particular, we, for the first time, identify the state-of-the-art performance of a simple logit loss. Our investigation is conducted in a wide range of transfer settings, especially including three new, realistic settings: ensemble transfer with little model similarity, transfer to low-ranked target classes, and transfer to the real-world Google Cloud Vision API. Results in these new settings demonstrate that the commonly adopted, easy settings cannot fully reveal the actual properties of different attacks and may cause misleading comparisons. Overall, the aim of our analysis is to inspire a more meaningful evaluation on targeted transferability.

Martha Larson, Zhuoran Liu, Zhengyu Zhao
-
[ Visit Poster at Spot C1 in Virtual World ]   

Data poisoning attacks manipulate victim's training data to compromise their model performance, after training. Previous works on poisoning have shown the inability of a small amount of poisoned data at significantly reducing the test accuracy of deep neural networks. In this work, we propose an upper bound on the test error induced by additive poisoning, which explains the difficulty of poisoning against deep neural networks. However, the limited effect of poisoning is restricted to the setting where training and test data are from the same distribution. To demonstrate this, we study the effect of poisoning in an unsupervised domain adaptation (UDA) setting where the source and the target domain distributions are different. We propose novel data poisoning attacks that prevent UDA methods from learning a representation that generalizes well on the target domain. Our poisoning attacks significantly lower the target domain accuracy of state-of-the-art UDA methods on popular benchmark UDA tasks, dropping it to almost 0% in some cases, with the addition of only 10% poisoned data. The effectiveness of our attacks in the UDA setting highlights the seriousness of the threat posed by data poisoning and the importance of data curation in machine learning.

Jihun Hamm, Pin-Yu Chen, Bhavya Kailkhura, Akshay Mehra
-
[ Visit Poster at Spot C0 in Virtual World ]   

Recently, adversarial attacks on image classification networks by the AutoAttack (Croce & Hein, 2020b) framework have drawn a lot of attention. While AutoAttack has shown a very high attack success rate, most defense approaches are focusing on network hardening and robustness enhancements, like adversarial training. This way, the currently best-reported method can withstand ∼ 66% of adversarial examples on CIFAR10. In this paper, we investigate the spatial and frequency domain properties of AutoAttack and propose an alternative defense. Instead of hardening a network, we detect adversarial attacks during inference, rejecting manipulated inputs. Based on a rather simple and fast analysis in the frequency domain, we introduce two different detection algorithms. First, a black box detector which only operates on the input images and achieves a detection accuracy of 100% on the AutoAttack CIFAR10 benchmark and 99.3% on ImageNet, for eps = 8/255 in both cases. Second, a whitebox detector using an analysis of CNN featuremaps, leading to a detection rate of also 100% and 98.7% on the same benchmarks.

Janis Keuper, Margret Keuper, Dominik Straßel, Paula Harder, Peter Lorenz
-
[ Visit Poster at Spot B6 in Virtual World ]

We propose a modified VAE (variational autoencoder) as a denoiser to remove adversarial perturbations for image classification. Vanilla VAE's purpose is to make latent variables approximating normal distribution, which reduces the latent inter-class distance of data points. Our proposed VAE modifies this problem by adding a latent variable cluster. So the VAE can guarantee inter-class distance of latent variables and learn class-wised features. Our Feature Clustering VAE performs better on removing perturbations and reconstructing the image to defend adversarial attacks.

Pan Gao, Cheng Zhang
-
[ Visit Poster at Spot B5 in Virtual World ]

We study the problem of audio adversarial example attacks with sparse perturbations. Compared with image adversarial example attacks, attacking audio is more challenging because the audio structure is more complex and the perturbation is difficult to conceal. To overcome this challenge, we propose an audio injection adversarial example attack, which provides a new sight light to increase the concealment of attack behavior. Experiments demonstrate that the proposed audio injection adversarial example attack can significantly reduce the perturbation proportion and achieve a better attack effect than traditional attack methods.

Kangyi Ding, Teng Hu, Yulong Wang, Mingyong Yin, Xingshu Chen, Xiaolei Liu
-
[ Visit Poster at Spot B4 in Virtual World ]
We study the adversarial robustness of information bottleneck models for classification. Previous works showed that the robustness of models trained with information bottlenecks can improve upon adversarial training. Our evaluation under a diverse range of white-box $l_{\infty}$ attacks suggests that information bottlenecks alone are not a strong defense strategy, and that previous results were likely influenced by gradient obfuscation.
Sven Gowal, Olivia Wiles, Alex Alemi, David Stutz, Iryna Korshunova
-
[ Visit Poster at Spot B3 in Virtual World ]

Well-trained models are valuable intellectual properties for their owners. Recent studies revealed that the adversaries can `steal' deployed models even when they have no training sample and can only query the model. Currently, there were some defense methods to alleviate this threat, mostly by increasing the cost of model stealing. In this paper, we explore the defense problem from another angle by \emph{verifying whether a suspicious model contains the knowledge of defender-specified external features}. We embed the \emph{external features} by \emph{poisoning} a few training samples via style transfer. After that, we train a meta-classifier, based on the gradient of predictions, to determine whether a suspicious model is stolen from the victim. Our method is inspired by the understanding that the stolen models should contain the knowledge of (external) features learned by the victim model. Experimental results demonstrate that our approach is effective in defending against different model stealing attacks simultaneously.

Xiaochun Cao, Shutao Xia, Yong Jiang, Xiaojun Jia, Yiming Li, Linghui Zhu
-
[ Visit Poster at Spot B2 in Virtual World ]

Using weight decay to penalize the L2 norms of weights in neural networks has been a standard training practice to regularize the complexity of networks. In this paper, we show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with positively homogeneous activation functions, such as linear, ReLU and max-pooling functions. As a result of homogeneity, functions specified by the networks are invariant to the shifting of weight scales between layers. The ineffective regularizers are sensitive to such shifting and thus poorly regularize the model capacity, leading to overfitting. To address this shortcoming, we propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network. The derived regularizer is an upper bound for the input gradient of the network so minimizing the improved regularizer also benefits the adversarial robustness. We demonstrate the efficacy of our proposed regularizer on various datasets and neural network architectures at improving generalization and adversarial robustness.

Antoni Chan, Yufei Cui, Ziquan Liu
-
[ Visit Poster at Spot B1 in Virtual World ]   

Transfer-based adversarial example is one of the most important classes of black-box attacks. Prior work in this direction often requires a fixed but large perturbation radius to reach a good transfer success rate. In this work, we propose a \emph{geometry-aware framework} to generate transferable adversarial perturbation with minimum norm for each input. Analogous to model selection in statistical machine learning, we leverage a validation model to select the optimal perturbation budget for each image. Extensive experiments verify the effectiveness of our framework on improving image quality of the crafted adversarial examples. The methodology is the foundation of our entry to the CVPR'21 Security AI Challenger: Unrestricted Adversarial Attacks on ImageNet, in which we ranked 1st place out of 1,559 teams and surpassed the runner-up submissions by 4.59\% and 23.91\% in terms of final score and average image quality level, respectively.

Hongyang Zhang, Chao Zhang, Fangcheng Liu
-
[ Visit Poster at Spot B0 in Virtual World ]   

Adversarial examples for neural networks are known to be transferable: examples optimized to be misclassified by a “source” network are often misclassified by other “destination” networks. Here, we show that training the source network to be “slightly robust”---that is, robust to small-magnitude adversarial examples---substantially improves the transferability of targeted attacks, even between architectures as different as convolutional neural networks and transformers. In fact, we show that these adversarial examples can transfer representation (penultimate) layer features substantially better than adversarial examples generated with non-robust networks. We argue that this result supports a non-intuitive hypothesis: slightly robust networks exhibit universal features---ones that tend to overlap with the features learned by all other networks trained on the same dataset. This suggests that the features of a single slightly-robust neural network may be useful to derive insight about the features of every non-robust neural network trained on the same distribution.

Garrett T Kenyon, Melanie Mitchell, Jacob M Springer
-
[ Visit Poster at Spot A6 in Virtual World ]

Adversarial attacks are the main security issue of deep neural networks. Detecting adversarial samples is an effective mechanism for defending adversarial attacks. Previous works on detecting adversarial samples show superior in accuracy but consume too much memory and computing resources. In this paper, we propose an adversarial sample detection method based on pruned models. We find that pruned neural network models are sensitive to adversarial samples, i.e., the pruned models tend to output labels different from the original model when given adversarial samples. Moreover, the channel pruned model has an extremely small model size and actual computational cost. Experiments on CIFAR10 and SVHN show that the FLOPs and size of our generated model are only 24.46\% and 4.86\% of the original model. It outperforms the SOTA multi-model based detection method (87.47\% and 63.00\%) by 5.29\% and 30.92\% on CIFAR10 and SVHN, respectively, with significantly fewer models used.

Qi Xuan, jingyang Xiang, Yao Lu, RenXuan Wang, Zuohui Chen
-
[ Visit Poster at Spot A5 in Virtual World ]   

Recently demonstrated physical-world adversarial attacks have exposed vulnerabilities in perception systems that pose severe risks for safety-critical applications such as autonomous driving. These attacks place adversarial artifacts in the physical world that indirectly cause the addition of a universal patch to inputs of a model that can fool it in a variety of contexts. Adversarial training is the most effective defense against image-dependent adversarial attacks. However, tailoring adversarial training to universal patches is computationally expensive since the optimal universal patch depends on the model weights which change during training. We propose meta adversarial training (MAT), a novel combination of adversarial training with meta-learning, which overcomes this challenge by meta-learning universal patches along with model training. MAT requires little extra computation while continuously adapting a large set of patches to the current model. MAT considerably increases robustness against universal patch attacks on image classification and traffic-light detection.

Robin Hutmacher, Nicole Finnie, Jan Hendrik Metzen
-
[ Visit Poster at Spot A4 in Virtual World ]

The existence of adversarial examples poses a real danger when deep neural networks are deployed in the real world. The go-to strategy to quantify this vulnerability is to evaluate the model against specific attack algorithms. This approach is however inherently limited, as it says little about the robustness of the model against more powerful attacks not included in the evaluation. We develop a unified mathematical framework to describe relaxation-based robustness certification methods, which go beyond adversary-specific robustness evaluation and instead provide provable robustness guarantees against attacks by any adversary. We discuss the fundamental limitations posed by single-neuron relaxations and show how the recent ``k-ReLU'' multi-neuron relaxation framework of Singh et al. (2019) obtains tighter correlation-aware activation bounds by leveraging additional relational constraints among groups of neurons. Specifically, we show how additional pre-activation bounds can be mapped to corresponding post-activation bounds and how they can in turn be used to obtain tighter robustness certificates. We also present an intuitive way to visualize different relaxation-based certification methods. By approximating multiple non-linearities jointly instead of separately, the k-ReLU method is able to bypass the convex barrier imposed by single neuron relaxations.

Kevin Roth
-
[ Visit Poster at Spot A3 in Virtual World ]   
We develop $\beta$-CROWN, a new bound propagation based method that can fully encode neuron split constraints in branch-and-bound (BaB) based complete verification via optimizable parameters $\beta$. When jointly optimized in intermediate layers, $\beta$-CROWN generally produces better bounds than typical LP verifiers with neuron split constraints, while being as efficient and parallelizable as CROWN on GPUs. Applied to complete robustness verification benchmarks, $\beta$-CROWN with BaB is close to three orders of magnitude faster than LP-based BaB methods, and is at least 3 times faster than winners of VNN-COMP 2020 competition while producing lower timeout rates. By terminating BaB early, our method can also be used for efficient incomplete verification. We achieve higher verified accuracy in many settings over powerful incomplete verifiers, including those based on convex barrier breaking techniques. Compared to the typically tightest but very costly semidefinite programming (SDP) based incomplete verifiers, we obtain higher verified accuracy with three orders of magnitudes less verification time, and enable better certification for verification-agnostic (e.g., adversarially trained) networks.
Zico Kolter, Cho-Jui Hsieh, Suman Jana, Xue Lin, Kaidi Xu, Huan Zhang, Shiqi Wang
-
[ Visit Poster at Spot A2 in Virtual World ]

Adversarial examples are semantically associated with one class, but modern Deep Learning architectures fail to see the semantics and associate them to another idea. As a result, these examples pose a profound risk to almost every Deep Learning architecture. Our proposed architecture is composed of a U-Net with the characteristics of Self Attention & Cross Attention. It can recover such examples effectively with more than 4x the magnitude of attacks that the state-of-the-art is capable of despite having lesser parameters than the VGG-13 model. Our study also encompasses the differences in the results between Noise and Image reconstruction of such examples.

Jatan Loya, Siddhant Kulkarni, Tejas Bana
-
[ Visit Poster at Spot A1 in Virtual World ]

While deep neural networks have achieved great success on the graph analysis, recent works have shown that they are also vulnerable to adversarial attacks where fraudulent users can fool the model with a limited number of queries. Compared with adversarial attacks on image classification, performing adversarial attack on graphs is challenging because of the discrete and non-differential nature of a graph. To address these issues, we proposed Cluster Attack, a novel adversarial attack by introducing a set of fake nodes to the original graph which can mislead the classification on certain victim nodes. Moreover, our attack is performed in a practical and unnoticeable manner. Extensive experiments demonstrate the effectiveness of our method in terms of the success rate of attack.

Jun Zhu, Zhongkai Hao, Zhengyi Wang
-
[ Visit Poster at Spot A0 in Virtual World ]

Deep Neural Networks (DNNs) have progressed rapidly during the past decade. Meanwhile, DNN models have been shown to be vulnerable to various security and privacy attacks. One such attack that has attracted a great deal of attention recently is the backdoor attack. Previous backdoor attacks mainly focus on computer vision tasks. In this paper, we perform the first systematic investigation of the backdoor attack against natural language processing (NLP) models with a focus on sentiment analysis task. Specifically, we propose three methods to construct triggers, including Word-level, Char-level, and Sentence-level triggers. Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model's utility. For instance, using the Word-level triggers, our backdoor attack achieves a 100% attack success rate with only a utility drop of 0.18%, 1.26%, and 0.19% on three benchmark sentiment analysis datasets.

Yang Zhang, Shiqing Ma, Michael Backes, Ahmed Salem, Xiaoyi Chen

Author Information

Hang Su (Tsinghua University)
Yinpeng Dong (Tsinghua University)
Tianyu Pang (Tsinghua University)
Eric Wong (MIT)
Zico Kolter (Carnegie Mellon University / Bosch Center for AI)
Shuo Feng (University of Michigan)
Bo Li (UIUC)
Henry Liu (U. of Michigan)
Dan Hendrycks (UC Berkeley)
Francesco Croce (University of Tübingen)
Leslie Rice (Carnegie Mellon University)
Tian Tian (Tsinghua University)

More from the Same Authors