Track: Adversarial Learning 3

Thu 22 July 17:00 - 17:20 PDT

Oral

Sparse and Imperceptible Adversarial Attack via a Homotopy Algorithm

Mingkang Zhu · Tianlong Chen · Zhangyang “Atlas” Wang

Sparse adversarial attacks can fool deep neural networks (DNNs) by only perturbing a few pixels (regularized by $\ell_0$ norm). Recent efforts combine it with another $\ell_\infty$ imperceptible on the perturbation magnitudes. The resultant sparse and imperceptible attacks are practically relevant, and indicate an even higher vulnerability of DNNs that we usually imagined. However, such attacks are more challenging to generate due to the optimization difficulty by coupling the $\ell_0$ regularizer and box constraints with a non-convex objective. In this paper, we address this challenge by proposing a homotopy algorithm, to jointly tackle the sparsity and the perturbation bound in one unified framework. Each iteration, the main step of our algorithm is to optimize an $\ell_0$ -regularized adversarial loss, by leveraging the nonmonotone Accelerated Proximal Gradient Method (nmAPG) for nonconvex programming; it is followed by an $\ell_0$ change control step, and an optional post-attack step designed to escape bad local minima. We also extend the algorithm to handling the structural sparsity regularizer. We extensively examine the effectiveness of our proposed \textbf{homotopy attack} for both targeted and non-targeted attack scenarios, on CIFAR-10 and ImageNet datasets. Compared to state-of-the-art methods, our homotopy attack leads to significantly fewer perturbations, e.g., reducing 42.91\% on CIFAR-10 and 75.03\% on ImageNet (average case, targeted attack), at similar maximal perturbation magnitudes, when still achieving 100\% attack success rates. Our codes are available at: {\small\url{https://github.com/VITA-Group/SparseADV_Homotopy}}.

Thu 22 July 17:20 - 17:25 PDT

Spotlight

Maximum Mean Discrepancy Test is Aware of Adversarial Attacks

Ruize Gao · Feng Liu · Jingfeng Zhang · Bo Han · Tongliang Liu · Gang Niu · Masashi Sugiyama

The maximum mean discrepancy (MMD) test could in principle detect any distributional discrepancy between two datasets. However, it has been shown that the MMD test is unaware of adversarial attacks--the MMD test failed to detect the discrepancy between natural data and adversarial data. Given this phenomenon, we raise a question: are natural and adversarial data really from different distributions? The answer is affirmative--the previous use of the MMD test on the purpose missed three key factors, and accordingly, we propose three components. Firstly, the Gaussian kernel has limited representation power, and we replace it with an effective deep kernel. Secondly, the test power of the MMD test was neglected, and we maximize it following asymptotic statistics. Finally, adversarial data may be non-independent, and we overcome this issue with the help of wild bootstrap. By taking care of the three factors, we verify that the MMD test is aware of adversarial attacks, which lights up a novel road for adversarial data detection based on two-sample tests.

Thu 22 July 17:25 - 17:30 PDT

Spotlight

Learning Diverse-Structured Networks for Adversarial Robustness

Xuefeng Du · Jingfeng Zhang · Bo Han · Tongliang Liu · Yu Rong · Gang Niu · Junzhou Huang · Masashi Sugiyama

In adversarial training (AT), the main focus has been the objective and optimizer while the model has been less studied, so that the models being used are still those classic ones in standard training (ST). Classic network architectures (NAs) are generally worse than searched NA in ST, which should be the same in AT. In this paper, we argue that NA and AT cannot be handled independently, since given a dataset, the optimal NA in ST would be no longer optimal in AT. That being said, AT is time-consuming itself; if we directly search NAs in AT over large search spaces, the computation will be practically infeasible. Thus, we propose diverse-structured network (DS-Net), to significantly reduce the size of the search space: instead of low-level operations, we only consider predefined atomic blocks, where an atomic block is a time-tested building block like the residual block. There are only a few atomic blocks and thus we can weight all atomic blocks rather than find the best one in a searched block of DS-Net, which is an essential tradeoff between exploring diverse structures and exploiting the best structures. Empirical results demonstrate the advantages of DS-Net, i.e., weighting the atomic blocks.

Thu 22 July 17:30 - 17:35 PDT

Spotlight

PopSkipJump: Decision-Based Attack for Probabilistic Classifiers

Carl-Johann Simon-Gabriel · Noman Ahmed Sheikh · Andreas Krause

Most current classifiers are vulnerable to adversarial examples, small input perturbations that change the classification output. Many existing attack algorithms cover various settings, from white-box to black-box classifiers, but usually assume that the answers are deterministic and often fail when they are not. We therefore propose a new adversarial decision-based attack specifically designed for classifiers with probabilistic outputs. It is based on the HopSkipJump attack by Chen et al. (2019), a strong and query efficient decision-based attack originally designed for deterministic classifiers. Our P(robabilisticH)opSkipJump attack adapts its amount of queries to maintain HopSkipJump’s original output quality across various noise levels, while converging to its query efficiency as the noise level decreases. We test our attack on various noise models, including state-of-the-art off-the-shelf randomized defenses, and show that they offer almost no extra robustness to decision-based attacks. Code is available at https://github.com/cjsg/PopSkipJump.

Thu 22 July 17:35 - 17:40 PDT

Spotlight

Towards Better Robust Generalization with Shift Consistency Regularization

Shufei Zhang · Zhuang Qian · Kaizhu Huang · Qiufeng Wang · Rui Zhang · Xinping Yi

While adversarial training becomes one of the most promising defending approaches against adversarial attacks for deep neural networks, the conventional wisdom through robust optimization may usually not guarantee good generalization for robustness. Concerning with robust generalization over unseen adversarial data, this paper investigates adversarial training from a novel perspective of shift consistency in latent space. We argue that the poor robust generalization of adversarial training is owing to the significantly dispersed latent representations generated by training and test adversarial data, as the adversarial perturbations push the latent features of natural examples in the same class towards diverse directions. This is underpinned by the theoretical analysis of the robust generalization gap, which is upper-bounded by the standard one over the natural data and a term of feature inconsistent shift caused by adversarial perturbation – a measure of latent dispersion. Towards better robust generalization, we propose a new regularization method – shift consistency regularization (SCR) – to steer the same-class latent features of both natural and adversarial data into a common direction during adversarial training. The effectiveness of SCR in adversarial training is evaluated through extensive experiments over different datasets, such as CIFAR-10, CIFAR-100, and SVHN, against several competitive methods.

Thu 22 July 17:40 - 17:45 PDT

Spotlight

Robust Learning for Data Poisoning Attacks

Yunjuan Wang · Poorya Mianjy · Raman Arora

We investigate the robustness of stochastic approximation approaches against data poisoning attacks. We focus on two-layer neural networks with ReLU activation and show that under a specific notion of separability in the RKHS induced by the infinite-width network, training (finite-width) networks with stochastic gradient descent is robust against data poisoning attacks. Interestingly, we find that in addition to a lower bound on the width of the network, which is standard in the literature, we also require a distribution-dependent upper bound on the width for robust generalization. We provide extensive empirical evaluations that support and validate our theoretical results.

Thu 22 July 17:45 - 17:50 PDT

Spotlight

Mind the Box: $l_1$ -APGD for Sparse Adversarial Attacks on Image Classifiers

Francesco Croce · Matthias Hein

We show that when taking into account also the image domain $[0,1]^d$ , established $l_1$ -projected gradient descent (PGD) attacks are suboptimal as they do not consider that the effective threat model is the intersection of the $l_1$ -ball and $[0,1]^d$ . We study the expected sparsity of the steepest descent step for this effective threat model and show that the exact projection onto this set is computationally feasible and yields better performance. Moreover, we propose an adaptive form of PGD which is highly effective even with a small budget of iterations. Our resulting $l_1$ -APGD is a strong white-box attack showing that prior works overestimated their $l_1$ -robustness. Using $l_1$ -APGD for adversarial training we get a robust classifier with SOTA $l_1$ -robustness. Finally, we combine $l_1$ -APGD and an adaptation of the Square Attack to $l_1$ into $l_1$ -AutoAttack, an ensemble of attacks which reliably assesses adversarial robustness for the threat model of $l_1$ -ball intersected with $[0,1]^d$ .

Thu 22 July 17:50 - 17:55 PDT

Q&A

Main Navigation

Session

Adversarial Learning 3

Sparse and Imperceptible Adversarial Attack via a Homotopy Algorithm

Maximum Mean Discrepancy Test is Aware of Adversarial Attacks

Learning Diverse-Structured Networks for Adversarial Robustness

PopSkipJump: Decision-Based Attack for Probabilistic Classifiers

Towards Better Robust Generalization with Shift Consistency Regularization

Robust Learning for Data Poisoning Attacks

Mind the Box: $l_1$ -APGD for Sparse Adversarial Attacks on Image Classifiers

Q&A