Track: Adversarial Learning 4

Thu 22 July 19:00 - 19:20 PDT

Oral

A General Framework For Detecting Anomalous Inputs to DNN Classifiers

Jayaram Raghuram · Varun Chandrasekaran · Somesh Jha · Suman Banerjee

Detecting anomalous inputs, such as adversarial and out-of-distribution (OOD) inputs, is critical for classifiers (including deep neural networks or DNNs) deployed in real-world applications. While prior works have proposed various methods to detect such anomalous samples using information from the internal layer representations of a DNN, there is a lack of consensus on a principled approach for the different components of such a detection method. As a result, often heuristic and one-off methods are applied for different aspects of this problem. We propose an unsupervised anomaly detection framework based on the internal DNN layer representations in the form of a meta-algorithm with configurable components. We proceed to propose specific instantiations for each component of the meta-algorithm based on ideas grounded in statistical testing and anomaly detection. We evaluate the proposed methods on well-known image classification datasets with strong adversarial attacks and OOD inputs, including an adaptive attack that uses the internal layer representations of the DNN (often not considered in prior work). Comparisons with five recently-proposed competing detection methods demonstrates the effectiveness of our method in detecting adversarial and OOD inputs.

Thu 22 July 19:20 - 19:25 PDT

Spotlight

Towards Defending against Adversarial Examples via Attack-Invariant Features

Dawei Zhou · Tongliang Liu · Bo Han · Nannan Wang · Chunlei Peng · Xinbo Gao

Deep neural networks (DNNs) are vulnerable to adversarial noise. Their adversarial robustness can be improved by exploiting adversarial examples. However, given the continuously evolving attacks, models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples. To solve this problem, in this paper, we propose to remove adversarial noise by learning generalizable invariant features across attacks which maintain semantic classification information. Specifically, we introduce an adversarial feature learning mechanism to disentangle invariant features from adversarial noise. A normalization term has been proposed in the encoded space of the attack-invariant features to address the bias issue between the seen and unseen types of attacks. Empirical evaluations demonstrate that our method could provide better protection in comparison to previous state-of-the-art approaches, especially against unseen types of attacks and adaptive attacks.

Thu 22 July 19:25 - 19:30 PDT

Spotlight

Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons

Bohang Zhang · Tianle Cai · Zhou Lu · Di He · Liwei Wang

It is well-known that standard neural networks, even with a high classification accuracy, are vulnerable to small $\ell_\infty$ -norm bounded adversarial perturbations. Although many attempts have been made, most previous works either can only provide empirical verification of the defense to a particular attack method, or can only develop a certified guarantee of the model robustness in limited scenarios. In this paper, we seek for a new approach to develop a theoretically principled neural network that inherently resists $\ell_\infty$ perturbations. In particular, we design a novel neuron that uses $\ell_\infty$ -distance as its basic operation (which we call $\ell_\infty$ -dist neuron), and show that any neural network constructed with $\ell_\infty$ -dist neurons (called $\ell_{\infty}$ -dist net) is naturally a 1-Lipschitz function with respect to $\ell_\infty$ -norm. This directly provides a rigorous guarantee of the certified robustness based on the margin of prediction outputs. We then prove that such networks have enough expressive power to approximate any 1-Lipschitz function with robust generalization guarantee. We further provide a holistic training strategy that can greatly alleviate optimization difficulties. Experimental results show that using $\ell_{\infty}$ -dist nets as basic building blocks, we consistently achieve state-of-the-art performance on commonly used datasets: 93.09\% certified accuracy on MNIST ( $\epsilon=0.3$ ), 35.42\% on CIFAR-10 ( $\epsilon=8/255$ ) and 16.31\% on TinyImageNet ( $\epsilon=1/255$ ).

Thu 22 July 19:30 - 19:35 PDT

Spotlight

Uncovering the Connections Between Adversarial Transferability and Knowledge Transferability

Kaizhao Liang · Yibo Zhang · Boxin Wang · Zhuolin Yang · Sanmi Koyejo · Bo Li

Knowledge transferability, or transfer learning, has been widely adopted to allow a pre-trained model in the source domain to be effectively adapted to downstream tasks in the target domain. It is thus important to explore and understand the factors affecting knowledge transferability. In this paper, as the first work, we analyze and demonstrate the connections between knowledge transferability and another important phenomenon--adversarial transferability, \emph{i.e.}, adversarial examples generated against one model can be transferred to attack other models. Our theoretical studies show that adversarial transferability indicates knowledge transferability, and vice versa. Moreover, based on the theoretical insights, we propose two practical adversarial transferability metrics to characterize this process, serving as bidirectional indicators between adversarial and knowledge transferability. We conduct extensive experiments for different scenarios on diverse datasets, showing a positive correlation between adversarial transferability and knowledge transferability. Our findings will shed light on future research about effective knowledge transfer learning and adversarial transferability analyses.

Thu 22 July 19:35 - 19:40 PDT

Spotlight

Improving Gradient Regularization using Complex-Valued Neural Networks

Eric Yeats · Yiran Chen · Hai Li

Gradient regularization is a neural network defense technique that requires no prior knowledge of an adversarial attack and that brings only limited increase in training computational complexity. A form of complex-valued neural network (CVNN) is proposed to improve the performance of gradient regularization on classification tasks of real-valued input in adversarial settings. The activation derivatives of each layer of the CVNN are dependent on the combination of inputs to the layer, and locally stable representations can be learned for inputs the network is trained on. Furthermore, the properties of the CVNN parameter derivatives resist decrease of performance on the standard objective that is caused by competition with the gradient regularization objective. Experimental results show that the performance of gradient regularized CVNN surpasses that of real-valued neural networks with comparable storage and computational complexity. Moreover, gradient regularized complex-valued networks exhibit robust performance approaching that of real-valued networks trained with multi-step adversarial training.

Thu 22 July 19:40 - 19:45 PDT

Spotlight

Double-Win Quant: Aggressively Winning Robustness of Quantized Deep Neural Networks via Random Precision Training and Inference

Yonggan Fu · Qixuan Yu · Meng Li · Vikas Chandra · Yingyan Lin

Quantization is promising in enabling powerful yet complex deep neural networks (DNNs) to be deployed into resource constrained platforms. However, quantized DNNs are vulnerable to adversarial attacks unless being equipped with sophisticated techniques, leading to a dilemma of struggling between DNNs' efficiency and robustness. In this work, we demonstrate a new perspective regarding quantization's role in DNNs' robustness, advocating that quantization can be leveraged to largely boost DNNs’ robustness, and propose a framework dubbed Double-Win Quant that can boost the robustness of quantized DNNs over their full precision counterparts by a large margin. Specifically, we for the first time identify that when an adversarially trained model is quantized to different precisions in a post-training manner, the associated adversarial attacks transfer poorly between different precisions. Leveraging this intriguing observation, we further develop Double-Win Quant integrating random precision inference and training to further reduce and utilize the poor adversarial transferability, enabling an aggressive win-win" in terms of DNNs' robustness and efficiency. Extensive experiments and ablation studies consistently validate Double-Win Quant's effectiveness and advantages over state-of-the-art (SOTA) adversarial training methods across various attacks/models/datasets. Our codes are available at: https://github.com/RICE-EIC/Double-Win-Quant.

Thu 22 July 19:45 - 19:50 PDT

Spotlight

Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation

Jiawei Zhang · Linyi Li · Huichen Li · Xiaolu Zhang · Shuang Yang · Bo Li

Boundary based blackbox attack has been recognized as practical and effective, given that an attacker only needs to access the final model prediction. However, the query efficiency of it is in general high especially for high dimensional image data. In this paper, we show that such efficiency highly depends on the scale at which the attack is applied, and attacking at the optimal scale significantly improves the efficiency. In particular, we propose a theoretical framework to analyze and show three key characteristics to improve the query efficiency. We prove that there exists an optimal scale for projective gradient estimation. Our framework also explains the satisfactory performance achieved by existing boundary black-box attacks. Based on our theoretical framework, we propose Progressive-Scale enabled projective Boundary Attack (PSBA) to improve the query efficiency via progressive scaling techniques. In particular, we employ Progressive-GAN to optimize the scale of projections, which we call PSBA-PGAN. We evaluate our approach on both spatial and frequency scales. Extensive experiments on MNIST, CIFAR-10, CelebA, and ImageNet against different models including a real-world face recognition API show that PSBA-PGAN significantly outperforms existing baseline attacks in terms of query efficiency and attack success rate. We also observe relatively stable optimal scales for different models and datasets. The code is publicly available at https://github.com/AI-secure/PSBA.

Thu 22 July 19:50 - 19:55 PDT

Q&A

Main Navigation

Session

Adversarial Learning 4

A General Framework For Detecting Anomalous Inputs to DNN Classifiers

Towards Defending against Adversarial Examples via Attack-Invariant Features

Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons

Uncovering the Connections Between Adversarial Transferability and Knowledge Transferability

Improving Gradient Regularization using Complex-Valued Neural Networks

Double-Win Quant: Aggressively Winning Robustness of Quantized Deep Neural Networks via Random Precision Training and Inference

Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation

Q&A