Timezone: »

Workshop
New Frontiers in Adversarial Machine Learning
Sijia Liu · Pin-Yu Chen · Dongxiao Zhu · Eric Wong · Kathrin Grosse · Hima Lakkaraju · Sanmi Koyejo

Fri Jul 22 05:50 AM -- 02:10 PM (PDT) @ Room 343 - 344

Adversarial machine learning (AdvML), which aims at tricking ML models by providing deceptive inputs, has been identified as a powerful method to improve various trustworthiness metrics (e.g., adversarial robustness, explainability, and fairness) and to advance versatile ML paradigms (e.g., supervised and self-supervised learning, and static and continual learning). As a consequence of the proliferation of AdvML-inspired research works, the proposed workshop–New Frontiers in AdvML–aims to identify the challenges and limitations of current AdvML methods and explore new prospective and constructive views of AdvML across the full theory/algorithm/application stack. The workshop will explore the new frontiers of AdvML from the following new perspectives: (1) advances in foundational AdvML research, (2) principles and practice of scalable AdvML, and (3) AdvML for good. This will be a full-day workshop, which accepts full paper submissions (up to 6 pages) as well as “blue sky” extended abstract submissions (up to 2 pages).

 Fri 5:50 a.m. - 6:00 a.m. Opening Remarks 🔗 Fri 6:00 a.m. - 6:30 a.m. Adversarial attacks on deep learning : Model explanation & transfer to the physical world (Talk)  link » Despite their remarkable success, deep models are brittle and can be manipulated easily by corrupting data with carefully crafted perturbations that are largely imperceptible to human observers. In this talk, I will give a brief background of the three stages of attacks on deep models including adversarial perturbations, data poisoning and Trojan models. I will then discuss universal perturbations, including our work on the detection and removal of such perturbations. Next, I will present Label Universal Targeted Attack (LUTA) that is image agnostic but optimized for a particular input and output class. LUTA has interesting properties beyond model fooling and can be extended to explain deep models, and perform image generation/manipulation. Universal perturbations, being image agnostic, fingerprint the deep model itself. We show that they can be used to detect Trojaned models. In the last part of my talk, I will present our work on transferring adversarial attacks to the physical world, simulated using graphics. I will discuss attacks on action recognition where the perturbations are computed on human skeletons and then transferred to videos. Finally, I will present our work on 3D adversarial textures computed using neural rendering to fool models in a pure black-box setting where the target model and training data are both unknown. I will conclude my talk with some interesting insights into adversarial machine learning. Link » Ajmal Mian 🔗 Fri 6:30 a.m. - 7:00 a.m. A tale of adversarial attacks & out-of-distribution detection stories in the activation space (Talk)  link »    Abstract: Most deep learning models assume ideal conditions and rely on the assumption that test/production data comes from the in-distribution samples from the training data. However, this assumption is not satisfied in most real-world applications. Test data could differ from the training data either due to adversarial perturbations, new classes, generated content, noise, or other distribution changes. These shifts in the input data can lead to classifying unknown types, classes that do not appear during training, as known with high confidence. On the other hand, adversarial perturbations in the input data can cause a sample to be incorrectly classified. In this talk, we will discuss approaches based on group and individual subset scanning methods from the anomalous pattern detection domain and how they can be applied over off-the-shelf DL models. Short bio: Celia Cintas is a Research Scientist at IBM Research Africa - Nairobi. She is a member of the AI Science team at the Kenya Lab. Her current research focuses on the improvement of ML techniques to address challenges in Global Health and exploring subset scanning for anomalous pattern detection under generative models. Previously, a grantee from the National Scientific and Technical Research Council (CONICET) working on Deep Learning techniques for population studies at LCI-UNS and IPCSH-CONICET as part of the Consortium for Analysis of the Diversity and Evolution of Latin America (CANDELA). She holds a Ph.D. in Computer Science from Universidad del Sur (Argentina). https://celiacintas.github.io/about/ Link » Celia Cintas 🔗 Fri 7:00 a.m. - 7:06 a.m. Model Transferability With Responsive Decision Subjects (Poster) This paper studies model transferability when human decision subjects respond to a deployed machine learning model. In our setting, an agent or a user corresponds to a sample $(X,Y)$ drawn from a distribution $\D$ and will face a model $h$ and its classification result $h(X)$. Agents can modify $X$ to adapt to $h$, which will incur a distribution shift on $(X,Y)$. Therefore, when training $h$, the learner will need to consider the subsequently induced" distribution when the output model is deployed. Our formulation is motivated by applications where the deployed machine learning models interact with human agents, and will ultimately face \emph{responsive} and \emph{interactive} data distributions. We formalize the discussions of the transferability of a model by studying how the model trained on the available source distribution (data) would translate to the performance on the induced domain. We provide both upper bounds for the performance gap due to the induced domain shift, as well as lower bound for the trade-offs that a classifier has to suffer on either the source training distribution or the induced target distribution. We provide further instantiated analysis for two popular domain adaptation settings with \emph{covariate shift} and \emph{target shift}. Yang Liu · Yatong Chen · Zeyu Tang · Kun Zhang 🔗 Fri 7:06 a.m. - 7:12 a.m. What is a Good Metric to Study Generalization of Minimax Learners? (Poster)    Minimax optimization has served as the backbone of many machine learning (ML) problems. Although the convergence behavior of optimization algorithms has been extensively studied in minimax settings, their generalization guarantees, i.e., how the model trained on empirical data performs on the unseen testing data, have been relatively under-explored. A fundamental question remains elusive: What is a good metric to study generalization of minimax learners? In this paper, we aim to answer this question by first showing that primal risk, a universal metric to study generalization in minimization problems, fails in simple examples of minimax problems. Furthermore, another popular metric, the primal-dual risk, also fails to characterize the generalization behavior for minimax problems with nonconvexity, due to non-existence of saddle points. We thus propose a new metric to study generalization of minimax learners: the primal gap, to circumvent these issues. Next, we derive generalization bounds for the primal gap in nonconvex-concave settings. As byproducts of our analysis, we also solve two open questions: establishing generalization bounds for primal risk and primal-dual risk in this setting, and in the strong sense, i.e., without assuming that the maximization and expectation can be interchanged. Finally, we leverage this new metric to compare the generalization behavior of two popular algorithms - gradient descent-ascent (GDA) and gradient descent-max (GDMax) in minimax optimization. Asuman Ozdaglar · Sarath Pattathil · Jiawei Zhang · Kaiqing Zhang 🔗 Fri 7:12 a.m. - 7:18 a.m. Toward Efficient Robust Training against Union of Lp Threat Models (Poster) The overwhelming vulnerability of deep neural networks to carefully crafted perturbations known as adversarial attacks has led to the development of various training techniques to produce robust models. While the primary focus of existing approaches has been directed toward addressing the worst-case performance achieved under a single-threat model, it is imperative that safety-critical systems are robust with respect to multiple threat models simultaneously. Existing approaches that address worst-case performance under the union of such threat models (e.g. L-infinity, L2, L1) either utilize adversarial training methods that require multi-step attacks which are computationally expensive in practice, or rely upon fine-tuning of pre-trained models that are robust with respect to a single-threat model. In this work, we show that by carefully choosing the objective function used for robust training, it is possible to achieve similar, or even improved worst-case performance over a union of threat models while utilizing only single-step attacks during the training, thereby achieving a significant reduction in computational resources necessary for training. Furthermore, prior work showed that adversarial training against the L1 threat model is relatively difficult, to the extent that even multi-step adversarially trained models were shown to be prone to gradient-masking and catastrophic overfitting. However, our proposed method—when applied on the L1 threat model specifically—enables us to obtain the first L1 robust model trained solely with single-step adversarial attacks. Gaurang Sriramanan · Maharshi Gor · Soheil Feizi 🔗 Fri 7:18 a.m. - 7:26 a.m. On the interplay of adversarial robustness and architecture components: patches, convolution and attention (Poster) []     In recent years novel architecture components for image classification have been developed, starting with attention and patches used in transformers. While prior works have analyzed the influence of some aspects of architecture components on the robustness to adversarial attacks, in particular for vision transformers, the understanding of the main factors is still limited. We compare several (non)-robust classifiers with different architectures and study their properties, including the effect of adversarial training on the interpretability of the learnt features and robustness to unseen threat models. An ablation from ResNet to ConvNeXt reveals key architectural changes leading to almost $10\%$ higher $\ell_\infty$-robustness. Francesco Croce · Matthias Hein 🔗 Fri 7:30 a.m. - 8:00 a.m. Machine Learning Security: Lessons Learned and Future Challenges (Talk)  link »    In this talk, I will briefly review some recent advancements in the area of machine learning security with a critical focus on the main factors which are hindering progress in this field. These include the lack of an underlying, systematic and scalable framework to properly evaluate machine-learning models under adversarial and out-of-distribution scenarios, along with suitable tools for easing their debugging. The latter may be helpful to unveil flaws in the evaluation process, as well as the presence of potential dataset biases and spurious features learned during training. I will finally report concrete examples of what our laboratory has been recently working on to enable a first step towards overcoming these limitations, in the context of Android and Windows malware detection. Battista Biggio (MSc 2006, PhD 2010) is an Assistant Professor at the University of Cagliari, Italy, and co-founder of Pluribus One (pluribus-one.it). His research interests include machine learning and cybersecurity. He has provided pioneering contributions in the area of ML security, demonstrating the first gradient-based evasion and poisoning attacks, and how to mitigate them, playing a leading role in the establishment and advancement of this research field. He has managed six research projects, and served as a PC member for the most prestigious conferences and journals in the area of ML and computer security (ICML, NeurIPS, ICLR, IEEE SP, USENIX Security). He chaired the IAPR TC on Statistical Pattern Recognition Techniques (2016-2020), co-organized S+SSPR, AISec and DLS, and served as Associate Editor for IEEE TNNLS, IEEE CIM and Pattern Recognition. He is a senior member of the IEEE and ACM, and a member of the IAPR and ELLIS. Link » Battista Biggio 🔗 Fri 8:00 a.m. - 8:30 a.m. What Can the Primate Brain Teach Us about Robust Object Recognition? (Talk)  link » Many of the current state-of-the-art object recognition models such as convolutional neural networks (CNN) are loosely inspired by the primate visual system. However, there still exist many discrepancies between these models and primates, both in terms of their internal processing mechanisms and their respective behavior on object recognition tasks. Of particular concern, many current models suffer from remarkable sensitivity to adversarial attacks, a phenomenon which does not appear to plague the primate visual system. Recent work has demonstrated that adding more biologically-inspired components or otherwise driving models to use representations more similar to the primate brain is one way to improve their robustness to adversarial attacks. In this talk, I will review some of these insights and successes such as relationships between the primary visual cortex and robustness, discuss recent findings about how neural recordings from later regions of the primate ventral stream might help to align model and human behavior, and finally conclude with recent neurophysiological results questioning exactly how robust representations in the primate brain truly are. Bio: Joel Dapello is a PhD candidate in Applied Math at the Harvard School of Engineering and Applied Sciences, currently working with Jim DiCarlo and David Cox at the intersection of machine learning and primate cognitive neuroscience. Prior to this, Joel was the founding engineer at BioBright, and received his bachelors in neuroscience from Hampshire College. Joel’s interests are centered around neural computation and information processing in both biological and artificial neural systems. Link » Joel Dapello 🔗 Fri 8:30 a.m. - 9:00 a.m. Poster Session For all papers. 🔗 Fri 9:00 a.m. - 10:00 a.m. Lunch 🔗 Fri 10:00 a.m. - 10:30 a.m. New adversarial ML applications on safety-critical human-robot systems (Talk)  link » In this talk, I will discuss several applications of adversarial ML to enhance safety of human-robot systems. All the applications are under a general framework of minimax optimization over neural networks, where the inner loop computes the worst case performance and the outer loop optimize NN parameters to improve the worst case performance. We have applied this approach to develop robust models for human prediction, to learn safety certificate for robot control, and to jointly synthesize robot policy and the safety certificate. Bio: Dr. Changliu Liu is an assistant professor in the Robotics Institute, School of Computer Science, Carnegie Mellon University (CMU), where she leads the Intelligent Control Lab. Prior to joining CMU, Dr. Liu was a postdoc at Stanford Intelligent Systems Laboratory. She received her Ph.D. from University of California at Berkeley and her bachelor degrees from Tsinghua University. Her research interests lie in the design and verification of intelligent systems with applications to manufacturing and transportation. She published the book “Designing robot behavior in human-robot interactions” with CRC Press in 2019. She initiated and has been organizing the international verification of neural network competition (VNN-COMP) since 2020. Her work is recognized by NSF Career Award, Amazon Research Award, and Ford URP Award. Link » Changliu Liu 🔗 Fri 10:30 a.m. - 11:00 a.m. Dr. Aleksander Madry's Talk (Talk)  link » TBD Link » Aleksander Madry 🔗 Fri 11:00 a.m. - 11:05 a.m. Overcoming Adversarial Attacks for Human-in-the-Loop Applications (Blue Sky Idea) []     Including human analysis has the potential to positively affect the robustness of Deep Neural Networks and is relatively unexplored in the Adversarial Machine Learning literature. Neural network visual explanation maps have been shown to be prone to adversarial attacks. Further research is needed in order to select robust visualizations of explanations for the image analyst to evaluate a given model. These factors greatly impact Human-In-The-Loop (HITL) evaluation tools due to their reliance on adversarial images, including explanation maps and measurements of robustness. We believe models of human visual attention may improve interpretability and robustness of human-machine imagery analysis systems. Our challenge remains, how can HITL evaluation be robust in this adversarial landscape? Ryan McCoppin · Sean Kennedy · Platon Lukyanenko · Marla Kennedy 🔗 Fri 11:05 a.m. - 11:10 a.m. Ad Hoc Teamwork in the Presence of Adversaries (Blue Sky Idea)    Advances in ad hoc teamwork have the potential to create agents that collaborate robustly in real-world applications. Agents deployed in the real world, however, are vulnerable to adversaries with the intent to subvert them. There has been little research in ad hoc teamwork that assumes the presence of adversaries. We explain the importance of extending ad hoc teamwork to include the presence of adversaries and clarify why this problem is difficult. We then propose some directions for new research opportunities in ad hoc teamwork that leads to more robust multi-agent cyber-physical infrastructure systems. Ted Fujimoto · Samrat Chatterjee · Auroop R Ganguly 🔗 Fri 11:10 a.m. - 11:15 a.m. Learner Knowledge Levels in Adversarial Machine Learning (Blue Sky Idea)    For adversarial robustness in a practical setting, it is important to consider realistic levels of knowledge that the learner has about the adversary's choice in perturbations. We present two levels of learner knowledge, (1) full knowledge which contains the majority of current research in adversarial ML and (2) partial knowledge which captures a more realistic setting where the learner does not know how to mathematically model the true perturbation function used by the adversary. We discuss current literature within each category and propose potential research directions within the setting of partial knowledge. Sihui Dai · Prateek Mittal 🔗 Fri 11:19 a.m. - 11:23 a.m. Easy Batch Normalization (Blue Sky Idea)    It was shown that adversarial examples improve object recognition. But what about their opposite side, easy examples? Easy examples are samples that the machine learning model classifies correctly with high confidence. In our paper, we are making the first step toward exploring the potential benefits of using easy examples in the training procedure of neural networks. We propose to use an auxiliary batch normalization for easy examples for the standard and robust accuracy improvement. Arip Asadulaev · Alexander Panfilov · Andrey Filchenkov 🔗 Fri 11:23 a.m. - 11:27 a.m. Adversarial Training Improve Joint Energy-Based Generative Modelling (Blue Sky Idea)    We propose the novel framework for generative modeling using hybrid energy-based models. In our method we combine the interpretable input gradients of the robust classifier and Langevin Dynamics for sampling. Using the adversarial training we improves not only the training stability, but robustness and generative modelling of the joint energy-based models. Rostislav Korst · Arip Asadulaev 🔗 Fri 11:27 a.m. - 11:30 a.m. Multi-step domain adaptation by adversarial attack to $\mathcal{H} \Delta \mathcal{H}$-divergence (Blue Sky Idea)    Adversarial examples are transferable between different models. In our paper, we propose to use this property for multi-step domain adaptation. In unsupervised domain adaptation settings, we demonstrate that replacing the source domain with adversarial examples to $\mathcal{H} \Delta \mathcal{H}$-divergence can improve source classifier accuracy on the target domain. Our method can be connected with most domain adaptation techniques. We conducted a range of experiments and achieved improvement in accuracy on Digits and Office-Home datasets. Arip Asadulaev · Alexander Panfilov · Andrey Filchenkov 🔗 Fri 11:30 a.m. - 12:00 p.m. Robust physical perturbation attacks and defenses for deep learning visual classifiers (Talk)  link » Deep Neural networks are increasingly used in safety-critical situations such as autonomous driving. Our prior work at CVPR 2018 showed that robust physical adversarial examples can be crafted that fool state-of-the-art vision classifiers for domains such as traffic signs. Unfortunately, crafting those attacks still required manual selection of appropriate masks and whitebox access to the model being tested for robustness. We describe a recently developed system called GRAPHITE that can be a useful aid in automatically generating candidates for robust physical perturbation attacks. GRAPHITE can generate attacks in not only white-box, but also in black-box hard-label scenarios. In hard-label blackbox scenarios, GRAPHITE is able to find successful small-patch attacks with an average of only 566 queries for 92.2% of victim-target pairs for the GTSRB dataset. This is about a one to three orders of magnitude smaller query count than previously reported hard-label black-box attacks on similar datasets. We discuss potential implications of GRAPHITE as a helpful tool towards developing and evaluating defenses against robust physical perturbation attacks. For instance, GRAPHITE is also able to find successful attacks using perturbations that modify small areas of the input image against PatchGuard, a recently proposed defense against patch-based attacks. Bio: Atul Prakash is a Professor in Computer Science and Engineering at the University of Michigan, Ann Arbor with research interests in computer security and privacy. He received a Bachelor of Technology in Electrical Engineering from IIT, Delhi, India and a Ph.D. in Computer Science from the University of California, Berkeley. His recent research includes security analysis of emerging IoT software stacks, mobile payment infrastructure in India, and vulnerability of deep learning classifiers to physical perturbations. At the University of Michigan, He has served as Director of the Software Systems Lab, led the creation of the new Data Science undergraduate program, and is currently serving as the Associate Chair of the CSE Division. Link » Atul Prakash 🔗 Fri 12:00 p.m. - 12:30 p.m. Adversarial Robustness and Cryptography (Talk)  link » Over recent years, devising classification algorithms that are robust to adversarial perturbations has emerged as a challenging problem. In particular, deep neural nets (DNNs) seem to be susceptible to small imperceptible changes over test instances. However, the line of work in provable robustness, so far, has been focused on information theoretic robustness, ruling out even the existence of any adversarial examples. In this work, we study whether there is a hope to benefit from algorithmic nature of an attacker that searches for adversarial examples, and ask whether there is any learning task for which it is possible to design classifiers that are only robust against polynomial-time adversaries. Indeed, numerous cryptographic tasks (eg encryption of long messages) can only be secure against computationally bounded adversaries, and are indeed impossible for computationally unbounded attackers. Thus, it is natural to ask if the same strategy could help robust learning. We show that computational limitation of attackers can indeed be useful in robust learning by demonstrating the possibility of a classifier for some learning task for which computational and information theoretic adversaries of bounded perturbations have very different power. Namely, while computationally unbounded adversaries can attack successfully and find adversarial examples with small perturbation, polynomial time adversaries are unable to do so unless they can break standard cryptographic hardness assumptions. Short Biography Somesh Jha received his B.Tech from Indian Institute of Technology, sNew Delhi in Electrical Engineering. He received his Ph.D. in Computer Science from Carnegie Mellon University under the supervision of Prof. Edmund Clarke (a Turing award winner). Currently, Somesh Jha is the Lubar Professor in the Computer Sciences Department at the University of Wisconsin (Madison). His work focuses on analysis of security protocols, survivability analysis, intrusion detection, formal methods for security, and analyzing malicious code. Recently, he has focused his interested on privacy and adversarial ML (AML). Somesh Jha has published several articles in highly-refereed conferences and prominent journals. He has won numerous best-paper and distinguished-paper awards. Prof. Jha is the fellow of the ACM, IEEE and AAAS. Link » Somesh Jha 🔗 Fri 12:30 p.m. - 2:00 p.m. Poster Session For all papers. 🔗 Fri 2:00 p.m. - 2:10 p.m. Closing Remarks 🔗 - Rethinking Multidimensional Discriminator Output for Generative Adversarial Networks (Poster) The study of multidimensional discriminator (critic) output for Generative Adversarial Networks has been underexplored in the literature. In this paper, we generalize the Wasserstein GAN framework to take advantage of multidimensional critic output and explore its properties. We also introduce a square-root velocity transformation (SRVT) block which favors training in the multidimensional setting. Proofs of properties are based on our proposed maximal p-centrality discrepancy, which is bounded above by p-Wasserstein distance and fits the Wasserstein GAN framework with multidimensional critic output n. Especially when n = 1 and p = 1, the proposed discrepancy equals 1-Wasserstein distance. Theoretical analysis and empirical evidence show that high-dimensional critic output has its advantage on distinguishing real and fake distributions, and benefits faster convergence and diversity of results. Mengyu Dai · Haibin Hang · Anuj Srivastava 🔗 - Generative Models with Information-Theoretic Protection Against Membership Inference Attacks (Poster) []     Deep generative models, such as Generative Adversarial Networks (GANs), synthesize diverse high-fidelity data samples by estimating the underlying distribution of high dimensional data. Despite their success, GANs may disclose private information from the data they are trained on, making them susceptible to adversarial attacks such as membership inference attacks, in which an adversary aims to determine if a record was part of the training set. We propose an information theoretically motivated regularization term that prevents the generative model from overfitting to training data and encourages generalizability. We show that this penalty minimizes the Jensen–Shannon divergence between components of the generator trained on data with different membership, and that it can be implemented at low cost using an additional classifier. Our experiments on image datasets demonstrate that with the proposed regularization, which comes at only a small added computational cost, GANs are able to preserve privacy and generate high-quality samples that achieve better downstream classification performance compared to non-private and differentially private generative models. Parisa Hassanzadeh · Robert Tillman 🔗 - Availability Attacks on Graph Neural Networks (Poster)    Graph neural networks (GNNs) have become a popular approach for processing non-uniformly structured data in recent years. These models implement permutation-equivariant functions: their output does not depend on the order of the graph. Although reordering the graph does not affect model output, it is widely recognised that it may reduce inference latency. Less widely noted, however, is the observation that it is also possible to reorder the input graph to \textit{increase} latency, representing a possible security (availability) vulnerability. Reordering attacks are difficult to mitigate, as finding an efficient processing order for an arbitrary graph is challenging, yet discovering an inefficient order is practically trivial in many cases: random shuffling is often sufficient. We focus on point cloud GNNs, which we find are especially susceptible to reordering attacks, and which may be deployed in real-time, safety-critical applications such as autonomous vehicles. We propose a lightweight reordering mechanism for spatial data, which can be used to mitigate reordering attacks in this special case. This mechanism is effective in defending against the slowdowns from shuffling, which we find for point cloud models can increase message propagation latency by 7.1$\times$, with 81\% increases to end-to-end latency with PosPool models at 1M points. Shyam Tailor · Miguel Tairum Cruz · Tiago Azevedo · Nic Lane · Partha Maji 🔗 - Robust Models are less Over-Confident (Poster)    Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer vision tasks, their application in the real-world is still facing fundamental challenges. One of these open problems is the inherent lack of robustness, unveiled by the striking effectiveness of adversarial attacks. Adversarial training (AT) is often considered as a remedy to train more robust networks. In this paper, we empirically analyze a variety of adversarially trained models that achieve high robust accuracies when facing state-of-the-art attacks and we show that AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions even on clean data than non-robust models. Further, our analysis of robust models shows that not only AT but also the model's building blocks (like activation functions and pooling) have a strong influence on the models' prediction confidences. Julia Grabinski · Paul Gavrikov · Janis Keuper · Margret Keuper 🔗 - Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO (Poster) []     This work conducts the first analysis on the robustness against adversarial attacks on self-supervised Vision Transformers trained using DINO. First, we evaluate whether features learned through self-supervision are more robust to adversarial attacks than those emerging from supervised learning. Then, we present properties arising for attacks in the latent space. Finally, we evaluate whether three well-known defense strategies can increase adversarial robustness in downstream tasks by only fine-tuning the classification head to provide robustness even in view of limited compute resources. These defense strategies are: Adversarial Training, Ensemble Adversarial Training and Ensemble of Specialized Networks. Javier Rando · Thomas Baumann · Nasib Naimi · Max Mathys 🔗 - Distributionally Robust counterfactual Explanations via an End-to-End Training Approach (Poster)    Counterfactual (CF) explanations for machine learning (ML) models are preferred by end-users, as they explain the predictions of ML models by providing a recourse case to individuals who are adversely impacted by predicted outcomes. Existing CF explanation methods generate recourses under the assumption that the underlying target ML model remains stationary over time. However, due to commonly occurring distributional shifts in training data, ML models constantly get updated in practice, which might render previously generated recourses invalid and diminish end-users trust in our algorithmic framework. To address this problem, we propose RoCourseNet, a training framework that jointly optimizes for predictions and robust recourses to future data shifts. We have three main contributions: (i) We propose a novel \emph{virtual data shift (VDS)} algorithm to find worst-case shifted ML models by explicitly considering the worst-case data shift in the training dataset. (ii) We leverage adversarial training to solve a novel tri-level optimization problem inside RoCourseNet, which simultaneously generates predictions and corresponding robust recourses. (iii) Finally, we evaluate RoCourseNet's performance on three real-world datasets and show that RoCourseNet outperforms state-of-the-art baselines by $\sim$10\% in generating robust CF explanations. Hangzhi Guo · Feiran Jia · Jinghui Chen · Anna Squicciarini · Amulya Yadav 🔗 - Meta-Learning Adversarial Bandits (Poster) []  We study online learning with bandit feedback across multiple tasks, with the goal of improving average performance across tasks if they are similar according to some natural task-similarity measure. As the first to target the adversarial setting, we design a unified meta-algorithm that yields setting-specific guarantees for two important cases: multi-armed bandits (MAB) and bandit linear optimization (BLO). For MAB, the meta-algorithm tunes the initialization, step-size, and entropy parameter of the Tsallis-entropy generalization of the well-known Exp3 method, with the task-averaged regret provably improving if the entropy of the distribution over estimated optima-in-hindsight is small. For BLO, we learn the initialization, step-size, and boundary-offset of online mirror descent (OMD) with self-concordant barrier regularizers, showing that task-averaged regret varies directly with a measure induced by these functions on the interior of the action space. Our adaptive guarantees rely on proving that unregularized follow-the-leader combined with multiplicative weights is enough to online learn a non-smooth and non-convex sequence of affine functions of Bregman divergences that upper-bound the regret of OMD. Nina Balcan · Keegan Harris · Mikhail Khodak · Steven Wu 🔗 - Boosting Image Generation via a Robust Classifier (Poster) []  The interest of the machine learning community in image synthesis has grown significantly in recent years, with the introduction of a wide range of deep generative models and means for training them. In this work, we propose a general model-agnostic technique for improving the image quality and the distribution fidelity of generated images, obtained by any generative model. Our method, termed BIGRoC (Boosting Image Generation via a Robust Classifier), is based on a post-processing procedure via the guidance of a given robust classifier and without a need for additional training of the generative model. Given a synthesized image, we propose to update it through projected gradient steps over the robust classifier, in an attempt to refine its recognition. We demonstrate this post-processing algorithm on various image synthesis methods and show a significant improvement, both quantitatively and qualitatively, on CIFAR-10 and ImageNet. Specifically, BIGRoC improves the best performing diffusion model on ImageNet $128\times128$ by 14.81%, attaining an FID score of 2.53, and on $256\times256$ by 7.87%, achieving an FID of 3.63. Roy Ganz · Michael Elad 🔗 - Why adversarial training can hurt robust accuracy (Poster)    Machine learning classifiers with high test accuracy often perform poorly under adversarial attacks. It is commonly believed that adversarial training alleviates this issue. In this paper, we demonstrate that, surprisingly, the opposite can be true for a natural class of perceptible perturbations --- even though adversarial training helps when enough data is available, it may in fact hurt robust generalization in the small sample size regime. We first prove this phenomenon for a high-dimensional linear classification setting with noiseless observations. Using intuitive insights from the proof, we could surprisingly find perturbations on standard image datasets for which this behavior persists. Specifically, it occurs for perceptible attacks that effectively reduce class information such as object occlusions or corruptions. jacob clarysse · Julia Hörrmann · Fanny Yang 🔗 - Superclass Adversarial Attack (Poster) []     Adversarial attacks have only focused on changing the predictions of the classifier, but their danger greatly depends on how the class is mistaken. For example, when an automatic driving system mistakes a Persian cat for a Siamese cat, it is hardly a problem. However, if it mistakes a cat for a 120km/h minimum speed sign, serious problems can arise. As a stepping stone to more threatening adversarial attacks, we consider the superclass adversarial attack, which causes misclassification of not only fine classes, but also superclasses. We conducted the first comprehensive analysis of superclass adversarial attacks (an existing and 19 new methods) in terms of accuracy, speed, and stability, and identified several strategies to achieve better performance. Although this study is aimed at superclass misclassification, the findings can be applied to other problem settings involving multiple classes, such as top-k and multi-label classification attacks. Soichiro Kumano · Hiroshi Kera · Toshihiko Yamasaki 🔗 - Individually Fair Learning with One-Sided Feedback (Poster) []     We consider an online learning problem with one-sided feedback, in which the learner is able to observe the true label only for positively predicted instances. On each round, instances arrive and receive classification outcomes according to a randomized policy deployed by the learner, whose goal is to maximize accuracy while deploying \emph{individually fair} policies. We first extend the framework of Bechavod et al. (2020), which relies on the existence of a human fairness auditor for detecting fairness violations, to instead incorporate feedback from dynamically-selected panels of multiple, possibly inconsistent, auditors. We then construct an efficient reduction from our problem of online learning with one-sided feedback and a panel reporting fairness violations to the contextual combinatorial semi-bandit problem (Cesa-Bianchi & Lugosi, 2009, György et al., 2007). Finally, we show how to leverage the guarantees of two algorithms in the contextual combinatorial semi-bandit setting: Exp2 (Bubeck et al., 2012) and the oracle-efficient Context-Semi-Bandit-FTPL (Syrgkanis et al., 2016), to provide multi-criteria no regret guarantees simultaneously for accuracy and fairness. Our results resolve an open question of Bechavod et al. (2020), showing that individually fair and accurate online learning with auditor feedback can be carried out in the one-sided feedback setting. Yahav Bechavod · Aaron Roth 🔗 - Multi-Task Federated Reinforcement Learning with Adversaries (Poster)    Reinforcement learning algorithms, just like any other machine learning algorithm pose a serious threat from adversaries. The adversaries can manipulate the learning algorithm resulting in non-optimal policies. In this paper, we analyze the Multi-task Federated Reinforcement Learning algorithms, where multiple collaborative agents in various environments are trying to maximize the sum of discounted return, in the presence of adversarial agents. We argue that the common attack methods are not guaranteed to carry out a successful attack on Multi-task Federated Reinforcement Learning and propose an adaptive attack method with better attack performance. Furthermore, we modify the conventional federated reinforcement learning algorithm to address the issue of adversaries that works equally well with and without adversaries. Experimentation on reinforcement learning problems of different scales shows that the proposed attack method outperforms other general attack methods and the proposed modification to federated reinforcement learning algorithm was able to achieve near-optimal policies in the presence of adversarial agents. Aqeel Anwar · Zishen Wan · Arijit Raychowdhury 🔗 - Adversarial Cheap Talk (Poster) Adversarial attacks in reinforcement learning (RL) often assume highly-privileged access to the learning agent’s parameters, environment or data. Instead, this paper proposes a novel adversarial setting called a Cheap Talk MDP in which an Adversary has a minimal range of influence over the Victim. Parameterised as a deterministic policy that only conditions on the current state, an Adversary can merely append information to a Victim’s observation. To motivate the minimum-viability, we prove that in this setting the Adversary cannot occlude the ground truth, influence the underlying dynamics of the environment, introduce non-stationarity, add stochasticity, see the Victim’s actions, or access their parameters. Additionally, we present a novel meta-learning algorithm to train the Adversary, called adversarial cheap talk (ACT). Using ACT, we demonstrate that the resulting Adversary still manages to influence the Victim’s training and test performance despite these restrictive assumptions. Affecting train-time performance reveals a new attack vector and provides insight into the success and failure modes of existing RL algorithms. More specifically, we show that an ACT Adversary is capable of harming performance by interfering with the learner’s function approximation and helping the Victim’s performance by appending useful features. Finally, we demonstrate that an ACT Adversary can append information during train-time to directly and arbitrarily control the Victim at test-time in a zero-shot manner. Christopher Lu · Timon Willi · Alistair Letcher · Jakob Foerster 🔗 - Thinking Two Moves Ahead: Anticipating Other Users Improves Backdoor Attacks in Federated Learning (Poster) []     Federated learning is particularly susceptible to model poisoning and backdoor attacks because individual users have direct control over the training data and model updates. At the same time, the attack power of an individual user is limited because their updates are quickly drowned out by those of many other users. Existing attacks do not account for future behaviors of other users, and thus require many sequential updates and their effects are quickly erased. We propose an attack that anticipates and accounts for the entire federated learning pipeline, including behaviors of other clients, and ensures that backdoors are effective quickly and persist even after multiple rounds of community updates. We show that this new attack is effective in realistic scenarios where the attacker only contributes to a small fraction of randomly sampled rounds and demonstrate this attack on image classification, next-word prediction, and sentiment analysis. Yuxin Wen · Jonas Geiping · Liam Fowl · Hossein Souri · Rama Chellappa · Micah Goldblum · Tom Goldstein 🔗 - Synthetic Dataset Generation for Adversarial Machine Learning Research (Poster)    Existing adversarial example research focuses on digitally inserted perturbations on top of existing natural image datasets. This construction of adversarial examples is not realistic because it may be difficult, or even impossible, for an attacker to deploy such an attack in the real-world due to sensing and environmental effects. To better understand adversarial examples against cyber-physical systems, we propose approximating the real-world through simulation. In this paper we describe our synthetic dataset generation tool that enables scalable collection of such a synthetic dataset with realistic adversarial examples. We use the CARLA simulator to collect such a dataset and demonstrate simulated attacks that undergo the same environmental transforms and processing as real-world images. Our tools have been used to collect datasets to help evaluate the efficacy of adversarial examples, and can be found at https://willaddincameraready_version. Xiruo Liu · Shibani Singh · Cory Cornelius · Colin Busho · Mike Tan · Anindya Paul · Jason Martin 🔗 - Making Corgis Important for Honeycomb Classification: Adversarial Attacks on Concept-based Explainability Tools (Poster)    Methods for model explainability have become increasingly critical for testing the fairness and soundness of deep learning. Concept-based interpretability techniques, which use a small set of human-interpretable concept exemplars in order to measure the influence of a concept on a model's internal representation of input, are an important thread in this line of research. In this work we show that these explainability methods can suffer the same vulnerability to adversarial attacks as the models they are meant to analyze. We demonstrate this phenomenon on two well-known concept-based interpretability methods: TCAV and faceted feature visualization. We show that by carefully perturbing the examples of the concept that is being investigated, we can radically change the output of the interpretability method. The attacks that we propose can either induce positive interpretations (polka dots are an important concept for a model when classifying zebras) or negative interpretations (stripes are not an important factor in identifying images of a zebra). Our work highlights the fact that in safety-critical applications, there is need for security around not only the machine learning pipeline but also the model interpretation process. Davis Brown · Henry Kvinge 🔗 - Do Perceptually Aligned Gradients Imply Adversarial Robustness? (Poster) []  In the past decade, deep learning-based networks have achieved unprecedented success in numerous tasks, including image classification. Despite this remarkable achievement, recent studies have demonstrated that such networks are easily fooled by small malicious perturbations, also known as adversarial examples. This security weakness led to extensive research aimed at obtaining robust models. Beyond the clear robustness benefits of such models, it was also observed that their gradients with respect to the input align with human perception. Several works have identified Perceptually Aligned Gradients (PAG) as a byproduct of robust training, but none have considered it as a standalone phenomenon nor studied its own implications.In this work, we focus on this trait and test whether Perceptually Aligned Gradients imply Robustness. To this end, we develop a novel objective to directly promote PAG in training classifiers and examine whether models with such gradients are more robust to adversarial attacks. Extensive experiments on CIFAR-10 and STL validate that such models have improved robust performance, exposing the surprising bidirectional connection between PAG and robustness. Roy Ganz · Bahjat Kawar · Michael Elad 🔗 - Make Some Noise: Reliable and Efficient Single-Step Adversarial Training (Poster)    Recently, Wong et al. (2020) showed that adversarial training with single-step FGSM leads to a characteristic failure mode named catastrophic overfitting (CO), in which a model becomes suddenly vulnerable to multi-step attacks. Experimentally they showed that simply adding a random perturbation prior to FGSM (RS-FGSM) could prevent CO. However, Andriushchenko & Flammarion (2020) observed that RS-FGSM still leads to CO for larger perturbations, and proposed a computationally expensive regularizer (GradAlign) to avoid it. In this work, we methodically revisit the role of noise and clipping in single-step adversarial training. Contrary to previous intuitions, we find that using a stronger noise around the clean sample combined with not clipping is highly effective in avoiding CO for large perturbation radii. We then propose Noise-FGSM (N-FGSM) that, while providing the benefits of single-step adversarial training, does not suffer from CO. Empirical analyses on a large suite of experiments show that N-FGSM is able to match or surpass the performance of previous state of-the-art GradAlign while achieving 3x speed-up. Pau de Jorge Aranda · Adel Bibi · Riccardo Volpi · Amartya Sanyal · Phil Torr · Gregory Rogez · Puneet Dokania 🔗 - Catastrophic overfitting is a bug but also a feature (Poster) Despite clear computational advantages in building robust neural networks, adversarial training (AT) using single-step methods is unstable as it suffers from catastrophic overfitting (CO): Networks gain non-trivial robustness during the first stages of adversarial training, but suddenly reach a breaking point where they quickly lose all robustness in just a few iterations. Although some works have succeeded at preventing CO, the different mechanisms that lead to this remarkable failure mode are still poorly understood. In this work, however, we find that the interplay between the structure of the data and the dynamics of AT plays a fundamental role in CO. Specifically, through active interventions on typical datasets of natural images, we establish a causal link between the structure of the data and the onset of CO in single-step AT methods. This new perspective provides important insights into the mechanisms that lead to CO and paves the way towards a better understanding of the general dynamics of robust model construction. Guillermo Ortiz Jimenez · Pau de Jorge Aranda · Amartya Sanyal · Adel Bibi · Puneet Dokania · Pascal Frossard · Gregory Rogez · Phil Torr 🔗 - Fair Universal Representations using Adversarial Models (Poster)    We present a data-driven framework for learning fair universal representations (FUR) that guarantee statistical fairness for any learning task that may not be known a priori. Our framework leverages recent advances in adversarial learning to allow a data holder to learn representations in which a set of sensitive attributes are decoupled from the rest of the dataset. We formulate this as a constrained minimax game between an encoder and an adversary where the constraint ensures a measure of usefulness (utility) of the representation. For appropriately chosen adversarial loss functions, our framework precisely clarifies the optimal adversarial strategy against strong information-theoretic adversaries; it also achieves the fairness measure of demographic parity for the resulting constrained representations. We highlight our results for the UCI Adult and UTKFace datasets. Monica Welfert · Peter Kairouz · Jiachun Liao · Chong Huang · Lalitha Sankar 🔗 - Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch (Poster) []  As the curation of data for machine learning becomes increasingly automated, dataset tampering is a mounting threat. Backdoor attackers tamper with training data to embed a vulnerability in models that are trained on that data. This vulnerability is then activated at inference time by placing a "trigger'' into the model's input. Typical backdoor attacks insert the trigger directly into the training data, although the presence of such an attack may be visible upon inspection. In contrast, the Hidden Trigger Backdoor Attack achieves poisoning without placing a trigger into the training data at all. However, this hidden trigger attack is ineffective at poisoning neural networks trained from scratch. We develop a new hidden trigger attack, Sleeper Agent, which employs gradient matching, data selection, and target model re-training during the crafting process. Sleeper Agent is the first hidden trigger backdoor attack to be effective against neural networks trained from scratch. We demonstrate its effectiveness on ImageNet and in black-box settings. Hossein Souri · Liam Fowl · Rama Chellappa · Micah Goldblum · Tom Goldstein 🔗 - Early Layers Are More Important For Adversarial Robustness (Poster)    Adversarial training and its variants have become the de facto standard for combatting against adversarial attacks in machine learning models. In this paper, we seek insight into how an adversarially trained deep neural network (DNN) differs from its naturally trained counterpart, focusing on the role of different layers in the network. To this end, we develop a novel method to measure and attribute adversarial effectiveness to each layer, based on partial adversarial training. We find that, while all layers in an adversarially trained network contribute to robustness, earlier layers play a more crucial role. These conclusions are corroborated by a method of tracking the impact of adversarial perturbations as they flow across the network layers, based on the statistics of ”perturbation-to-signal ratios” across layers. While adversarial training results in black box DNNs which can only provide empirical assurances of robustness, our findings imply that the search for architectural principles in training and inference for building in robustness in an interpretable manner could start with the early layers of a DNN. Can Bakiskan · Metehan Cekic · Upamanyu Madhow 🔗 - Provably Adversarially Robust Detection of Out-of-Distribution Data (Almost) for Free (Poster)    The application of machine learning in safety-critical systems requires a reliable assessment of uncertainty. However, deep neural networks are known to produce highly overconfident predictions on out-of-distribution (OOD) data. Even if trained to be non-confident on OOD data one can still adversarially manipulate OOD data so that the classifier again assigns high confidence to the manipulated samples. We show that two previously published defenses can be broken by better adapted attacks, highlighting the importance of robustness guarantees around OOD data. Since the existing method for this task is hard to train and significantly limits accuracy, we construct a classifier that can simultaneously achieve provability and high clean accuracy. Moreover, by architectural construction our method provably avoids the asymptotic overconfidence problem of standard neural networks. Alexander Meinke · Julian Bitterwolf · Matthias Hein 🔗 - Attacking Adversarial Defences by Smoothing the Loss Landscape (Poster)    This paper investigates a family of methods for defending against adversarial attacks that owe part of their success to creating a rugged loss landscape that adversaries find difficult to navigate. A common, but not universal, way to achieve this effect is via the use of stochastic neural networks.We show that this is a form of gradient obfuscation, and propose a general extension to gradient-based adversaries based on the Weierstrass transform, which smooths the surface of the loss function and provides more reliable gradient estimates. We further show that the same principle can strengthen gradient-free adversaries.We demonstrate the efficacy of our loss-smoothing method against both stochastic and non-stochastic adversarial defences that exhibit robustness due to this type of obfuscation. Furthermore, we provide analysis of how it interacts with Expectation over Transformation; a popular gradient-sampling method currently used to attack stochastic defences. Panagiotis Eustratiadis · Henry Gouk · Da Li · Timothy Hospedales 🔗 - Sound randomized smoothing in floating-point arithmetics (Poster) []     Randomized smoothing is sound when using infinite precision. However, we show that randomized smoothing is no longer sound for limited floating-point precision. We present a simple example where randomized smoothing certifies a radius of $1.26$ around a point, even though there is an adversarial example in the distance $0.8$ and extend this example further to provide false certificates for CIFAR10. We discuss the implicit assumptions of randomized smoothing and show that they do not apply to generic image classification models whose smoothed versions are commonly certified. In order to overcome this problem, we propose a sound approach to randomized smoothing when using floating-point precision with essentially equal speed and matching the certificates of the standard, unsound practice for standard classifiers tested so far. Our only assumption is that we have access to a fair coin. Václav Voráček · Matthias Hein 🔗 - Robustness in deep learning: The width (good), the depth (bad), and the initialization (ugly) (Poster)    We study the average robustness notion in deep neural networks in (selected) wide and narrow, deep and shallow, as well as lazy and non-lazy training settings. We prove that in the under-parameterized setting, width has a negative effect while it improves robustness in the over-parameterized setting. The effect of depth closely depends on the initialization and the training mode. In particular, when initialized with LeCun initialization, depth helps robustness with lazy training regime. In contrast, when initialized with Neural Tangent Kernel (NTK) and He-initialization, depth exacerbates the robustness. Moreover, under non-lazy training regime, we demonstrate how the width of a two-layer ReLU network benefits robustness. Our theoretical developments improve the results by [Huang et al. NeurIPS21; Wu et al. NeurIPS21] and are consistent with [Bubeck and Sellke NeurIPS21; Bubeck et al. COLT21]. Zhenyu Zhu · Fanghui Liu · Grigorios Chrysos · Volkan Cevher 🔗 - Riemannian data-dependent randomized smoothing for neural network certification (Poster) Certification of neural networks is an important and challenging problem that has been attracting the attention of the machine learning community since few years. In this paper, we focus on randomized smoothing (RS) which is considered as the state-of-the-art method to obtain certifiably robust neural networks. In particular, a new data-dependent RS technique called ANCER introduced recently can be used to certify ellipses with orthogonal axis near each input data of the neural network. In this work, we remark that ANCER is not invariant under rotation of input data and propose a new rotationally-invariant formulation of it which can certify ellipses without constraints on their axis. Our approach called Riemannian Data Dependant Randomized Smoothing (RDDRS) relies on information geometry techniques on the manifold of covariance matrices and can certify bigger regions than ANCER based on our experiments on the MNIST dataset. Pol Labarbarie · Hatem Hajri · Marc Arnaudon 🔗 - Adversarial robustness of $\beta-$VAE through the lens of local geometry (Poster)    Variational autoencoders (VAEs) are susceptible to adversarial attacks. An adversary can find a small perturbation in the input sample to change its latent encoding non-smoothly, thereby compromising the reconstruction. A known reason for such vulnerability is the latent space distortions arising from a mismatch between approximated latent posterior and a prior distribution. As a result, a slight change in the inputs leads to a significant change in the latent space encodings. This paper demonstrates that the sensitivity at any given input exploits the directional bias of a stochastic pullback metric tensor induced by the encoder network. The pullback metric tensor captures how the infinitesimal region changes from the input to the latent space. Thus, it can be viewed as a lens to analyse distortions in the latent space. We propose evaluation scores using the eigenspectrum of a pullback metric. Moreover, we empirically show that the scores correlate with the robustness parameter $\beta$ of the $\beta-$VAE. Asif Khan · Amos Storkey 🔗 - Why do so?'' --- A practical perspective on adversarial machine learning (Poster)    Despite the large body of academic work on machine learning security, little is known about the occurrence of attacks on machine learning systems in the wild. In this paper, we analyze the answers of 139 industrial practitioners to a quantitative questionnaire about attack occurrence and concern. We find evidence for circumventions of AI systems in practice, although these are not the sole concern of our practitioners, as their reasoning on relevance and irrelevance of machine learning attacks is complex. Our work pave the way for more research about adversarial machine learning in practice, but yields also insights for machine learning regulation and auditing. Kathrin Grosse · Lukas Bieringer · Tarek R. Besold · Battista Biggio · Katharina Krombholz 🔗 - Adversarial Estimation of Riesz Representers (Poster) We provide an adversarial approach to estimating Riesz representers of linear functionals within arbitrary function spaces. We prove oracle inequalities based on the localized Rademacher complexity of the function space used to approximate the Riesz representer and the approximation error. These inequalities imply fast finite sample mean-squared-error rates for many function spaces of interest, such as high-dimensional sparse linear functions, neural networks and reproducing kernel Hilbert spaces. Our approach offers a new way of estimating Riesz representers with a plethora of recently introduced machine learning techniques. We show how our estimator can be used in the context of de-biasing structural/causal parameters in semi-parametric models, and for automated orthogonalization of moment equations. Victor Chernozhukov · Whitney Newey · Rahul Singh · Vasilis Syrgkanis 🔗 - Saliency Guided Adversarial Training for Tackling Generalization Gap with Applications to Medical Imaging Classification System (Poster)    This work tackles a central machine learning problem of performance degradation on out-of-distribution (OOD) test sets. The problem is particularly salient in medical imaging based diagnosis system that appears to be accurate but fails when tested in new hospitals/datasets. Recent studies indicate the system might learn shortcut and non-relevant features instead of generalizable features, so-called `good features'. We hypothesize that adversarial training can eliminate shortcut features whereas Saliency guided training can filter out non-relevant features; both are nuisance features accounting for the performance degradation on OOD test sets. With that, we formulate a novel model training scheme for the deep neural network to learn good features for classification and/or detection tasks ensuring a consistent generalization performance on OOD test sets. The experimental results qualitatively and quantitatively demonstrate the superior performance of our method using the benchmark CXR image data sets on classification tasks. Xin Li · Yao Qiang · CHNEGYIN LI · Sijia Liu · Dongxiao Zhu 🔗 - Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models (Poster)    A growing ecosystem of large, open-source foundation models has reduced the labeled data and technical expertise necessary to apply machine learning to many new problems. Yet foundation models pose a clear dual-use risk, indiscriminately reducing the costs of building both harmful and benign machine learning systems. To mitigate this risk, we propose the task blocking paradigm, in which foundation models are trained with an additional mechanism to impede adaptation to harmful tasks while retaining good performance on desired tasks. We call the resulting models self-destructing models, inspired by mechanisms that prevent adversaries from using tools for harmful purposes. We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning, showing that it can largely prevent a BERT-based model from learning to perform gender identification without harming the model's ability to perform profession classification. We conclude with a discussion of future directions. Eric Mitchell · Peter Henderson · Christopher Manning · Dan Jurafsky · Chelsea Finn 🔗 - Illusionary Attacks on Sequential Decision Makers and Countermeasures (Poster)    Autonomous intelligent agents deployed to the real-world need to be robust against adversarial attacks on sensory inputs. Existing work in reinforcement learning focuses on minimum-norm perturbation attacks, which were originally introduced to mimic a notion of perceptual invariance in computer vision. In this paper, we note that such minimum-norm perturbation attacks can be trivially detected by victim agents, as these result in observation sequences that are not consistent with the victim agent's actions.Furthermore, many real-world agents, such as physical robots, commonly operate under human supervisors, which are not susceptible to such perturbation attacks.As a result, we propose to instead focus on illusionary attacks, a novel form of attack that is consistent with the world model of the victim agent. We provide a formal definition of this novel attack framework, explore its characteristics under a variety of conditions, and conclude that agents must seek realism feedback to be robust to illusionary attacks. Tim Franzmeyer · Joao Henriques · Jakob Foerster · Phil Torr · Adel Bibi · Christian Schroeder 🔗 - Can we achieve robustness from data alone? (Poster)    Adversarial training and its variants have come to be the prevailing methods to achieve adversarially robust classification using neural networks. However, its increased computational cost together with the significant gap between standard and robust performance hinder progress and beg the question of whether we can do better. In this work, we take a step back and ask: \textit{Can models gain robustness via standard training on a suitably optimized set?} To this end, we devise a meta-learning method for robust classification, that optimizes the dataset prior to its deployment in a principled way, and aims to effectively remove the non-robust parts of the data. We cast our optimization method as a multi-step PGD procedure on kernel regression, with a class of kernels that describe infinitely wide neural nets (Neural Tangent Kernels - NTKs). Experiments on MNIST and CIFAR-10 demonstrate that the datasets we produce enjoy very high robustness against PGD attacks, when deployed in both kernel regression classifiers and neural networks. However, this robustness is somewhat fallacious, as alternative attacks manage to fool the models, which we find to be the case for previous similar works in the literature as well. We discuss potential reasons for this and outline further avenues of research. Julia Kempe · Nikolaos Tsilivis · Jingtong Su 🔗 - Gradient-Based Adversarial and Out-of-Distribution Detection (Poster)    We propose to utilize gradients for detecting adversarial and out-of-distribution samples.We introduce confounding labels---labels that differ from normal labels seen during training---in gradient generation to probe the effective expressivity of neural networks.Gradients depict the amount of change required for a model to properly represent given inputs, providing insight into the representational power of the model established by network architectural properties as well as training data.By introducing a label of different design, we remove the dependency on ground truth labels for gradient generation during inference.We show that our gradient-based approach allows for capturing the anomaly in inputs based on the effective expressivity of the models with no hyperparameter tuning or additional processing, and outperforms state-of-the-art methods for adversarial and out-of-distribution detection. Jinsol Lee · Mohit Prabhushankar · Ghassan AlRegib 🔗 - Investigating Why Contrastive Learning Benefits Robustness against Label Noise (Poster) []  Self-supervised contrastive learning has recently been shown to be very effective in preventing deep networks from overfitting noisy labels. Despite its empirical success, the theoretical understanding of the effect of contrastive learning on boosting robustness is very limited. In this work, we rigorously prove that learned the representation matrix has certain desirable properties in terms its SVD that benefit robustness against label noise. We further show that the low-rank structure of the Jacobian of deep networks pre-trained with contrastive learning allows them to achieve a superior performance initially, when fine-tuned on noisy labels. Finally, we demonstrate that the initial robustness provided by contrastive learning enables robust training methods to achieve state-of-the-art performance under extreme noise levels. Yihao Xue · Kyle Whitecross · Baharan Mirzasoleiman 🔗 - Layerwise Hebbian/anti-Hebbian (HaH) Learning In Deep Networks: A Neuro-inspired Approach To Robustness (Poster)    We propose a neuro-inspired approach for engineering robustness into deep neural networks (DNNs), in which end-to-end cost functions are supplemented with layer-wise costs promoting Hebbian (“fire together,” “wire together”) updates for highly active neurons, and anti-Hebbian updates for the remaining neurons. Unlike standard end-to-end training, which does not directly exert control over the features extracted at intermediate layers, Hebbian/anti-Hebbian (HaH) learning is aimed at producing sparse, strong activations which are more difficult to corrupt. We further encourage sparsity by introducing competition between neurons via divisive normalization and thresholding, together with implicit $\ell_2$ normalization of neuronal weights, instead of batch norm. Preliminary CIFAR-10 experiments demonstrate that our neuro-inspired model, trained without augmentation by noise or adversarial perturbations, is substantially more robust to a range of corruptions than a baseline end-to-end trained model. This opens up exciting research frontiers for training robust DNNs, with layer-wise costs providing a strategy complementary to that of data-augmented end-to-end training. Metehan Cekic · Can Bakiskan · Upamanyu Madhow 🔗 - Efficient and Effective Augmentation Strategy for Adversarial Training (Poster) []     The sample complexity of Adversarial training is known to be significantly higher than standard ERM based training. Although complex augmentation techniques have led to large gains in standard training, they have not been successful with Adversarial Training. In this work, we propose Diverse Augmentation based Joint Adversarial Training (DAJAT) that uses a combination of simple and complex augmentations with separate batch normalization layers to handle the conflicting goals of enhancing the diversity of the training dataset, while being close to the test distribution. We further introduce a Jensen-Shannon divergence loss to encourage the joint learning of the diverse augmentations, thereby allowing simple augmentations to guide the learning of complex ones. Lastly, to improve the computational efficiency of the proposed method, we propose and utilize a two-step defense, Ascending Constraint Adversarial Training (ACAT) that uses an increasing epsilon schedule and weight-space smoothing to prevent gradient masking. The proposed method achieves better performance compared to existing methods on the RobustBench Leaderboard for CIFAR-10 and CIFAR-100 on ResNet-18 and WideResNet-34-10 architectures. Sravanti Addepalli · Samyak Jain · Venkatesh Babu Radhakrishnan 🔗 - Robust Empirical Risk Minimization with Tolerance (Poster) Developing simple, sample-efficient learning algorithms for robust classification is a pressing issue in today’s tech-dominated world, and current theoretical techniques fall far from answering the need, often requiring exponential sample complexity and (necessarily) complicated improper learning rules. Towards this end, one natural path is to study relaxations of the standard model such as Ashtiani et. al. (2022)'s \textit{tolerant} learning, a recent notion where the output classifier is compared to the best achievable error over slightly larger perturbation sets. In this work, we show that under weak niceness conditions, achieving simple, sample-efficient robust learning is indeed possible: a natural tolerant variant of robust empirical risk minimization is in fact sufficient for learning over arbitrary perturbation sets of bounded diameter $D$ using only $O\left( \frac{vd\log \frac{dD}{\epsilon\gamma\delta}}{\epsilon^2}\right)$ samples for VC-dimension $v$ hypothesis classes over $\mathbb{R}^d$. Robi Bhattacharjee · Max Hopkins · Akash Kumar · Hantao Yu · Kamalika Chaudhuri 🔗 - Towards Out-of-Distribution Adversarial Robustness (Poster)    Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fails to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different $L_p$ norms, we show that there is space for improvement against many commonly used attacks by adopting a domain generalisation approach. In particular, we treat different attacks as domains, and apply the method of Risk Extrapolation (REx), which encourages similar levels of robustness against all training attacks. Compared to existing methods, we obtain similar or superior adversarial robustness on attacks seen during training. More significantly, we achieve superior performance on families or tunings of attacks only encountered at test time. On ensembles of attacks, this improves the accuracy from 3.4\% on the best existing baseline to 25.9\% on MNIST, and from 10.7\% to 17.9\% on CIFAR10. Adam Ibrahim · Charles Guille-Escuret · Ioannis Mitliagkas · Irina Rish · David Krueger · Pouya Bashivan 🔗 - Reducing Exploitability with Population Based Training (Poster)    Self-play reinforcement learning has achieved state-of-the-art, and often superhuman, performance in a variety of zero-sum games.Yet prior work has found that policies that are highly capable against regular opponents can fail catastrophically against adversarial policies: an opponent trained explicitly against the victim.Prior defenses using adversarial training were able to make the victim robust to a specific adversary, but the victim remained vulnerable to new adversaries.We conjecture this limitation was due to insufficient diversity of adversaries seen during training.We propose a defense using population-based training to pit the victim against a range of opponents.We evaluate this defense's robustness against new adversaries in two low-dimensional environments.We find that our defense significantly increases robustness against adversaries in both environments and show that robustness is correlated with the size of the opponent population. Pavel Czempin · 🔗 - RUSH: Robust Contrastive Learning via Randomized Smoothing (Poster) []  Recently, adversarial training has been incorporated in self-supervised contrastive pre-training to augment label efficiency with exciting adversarial robustness. However, the robustness came at a cost of expensive adversarial training. In this paper, we show a surprising fact that contrastive pre-training has an interesting yet implicit connection with robustness, and such natural robustness in the pre-trained representation enables us to design a powerful robust algorithm against adversarial attacks, RUSH, that combines the standard contrastive pre-training and randomized smoothing. It boosts both standard accuracy and robust accuracy, and significantly reduces training costs as compared with adversarial training. We use extensive empirical studies to show that the proposed RUSH outperforms robust classifiers from adversarial training, by a significant margin on common benchmarks (CIFAR-10, CIFAR-100, and STL-10) under first-order attacks. In particular, under l∞-norm perturbations of size 8/255 PGD attack on CIFAR-10, our model using ResNet-18 as backbone reached 77.8% robust accuracy and 87.9% standard accuracy. Our work has an improvement of over 15% in robust accuracy and a slight improvement in standard accuracy, com- pared to the state-of-the-arts. Yijiang Pang · Boyang Liu · Jiayu Zhou 🔗 - Welcome to New Frontiers in Adversarial Machine Learning@ICML 2022! (Introduction) This is an introduction video for authors. Yuguang Yao 🔗 - Dr. Nitesh Chawla's Talk (Talk)  link » TBD Link » Nitesh Chawla 🔗

#### Author Information

##### Dongxiao Zhu (Wayne State University)

Dongxiao Zhu is currently an Associate Professor at Department of Computer Science, Wayne State University. He received the B.S. from Shandong University (1996), the M.S. from Peking University (1999) and the Ph.D. from University of Michigan (2006). Dongxiao Zhu's recent research interests are in Machine Learning and Applications in health informatics, natural language processing, medical imaging and other data science domains. Dr. Zhu is the Director of Machine Learning and Predictive Analytics (MLPA) Lab and the Director of Computer Science Graduate Program at Wayne State University. He has published over 70 peer-reviewed publications and numerous book chapters and he served on several editorial boards of scientific journals. Dr. Zhu's research has been supported by NIH, NSF and private agencies and he has served on multiple NIH and NSF grant review panels. Dr. Zhu has advised numerous students at undergraduate, graduate and postdoctoral levels and his teaching interest lies in programming language, data structures and algorithms, machine learning and data science.

##### Sanmi Koyejo (Google / Illinois)

Sanmi (Oluwasanmi) Koyejo an Assistant Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Koyejo's research interests are in the development and analysis of probabilistic and statistical machine learning techniques motivated by, and applied to various modern big data problems. He is particularly interested in the analysis of large scale neuroimaging data. Koyejo completed his Ph.D in Electrical Engineering at the University of Texas at Austin advised by Joydeep Ghosh, and completed postdoctoral research at Stanford University with a focus on developing Machine learning techniques for neuroimaging data. His postdoctoral research was primarily with Russell A. Poldrack and Pradeep Ravikumar. Koyejo has been the recipient of several awards including the outstanding NCE/ECE student award, a best student paper award from the conference on uncertainty in artificial intelligence (UAI) and a trainee award from the Organization for Human Brain Mapping (OHBM).