Track: Oral B4 Robustness / Adversarial / Rl Bandits

Wed 26 July 19:00 - 19:08 PDT

Adversarial Policies Beat Superhuman Go AIs

Tony Wang · Adam Gleave · Tom Tseng · Kellin Pelrine · Nora Belrose · Joseph Miller · Michael Dennis · Yawen Duan · Viktor Pogrebniak · Sergey Levine · Stuart Russell

We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies against it, achieving a >97% win rate against KataGo running at superhuman settings. Our adversaries do not win by playing Go well. Instead, they trick KataGo into making serious blunders. Our attack transfers zero-shot to other superhuman Go-playing AIs, and is comprehensible to the extent that human experts can implement it without algorithmic assistance to consistently beat superhuman AIs. The core vulnerability uncovered by our attack persists even in KataGo agents adversarially trained to defend against our attack. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes. Example games are available https://goattack.far.ai/.

Wed 26 July 19:08 - 19:16 PDT

Outstanding Paper

Adapting to game trees in zero-sum imperfect information games

Côme Fiegel · Pierre Menard · Tadashi Kozuno · Remi Munos · Vianney Perchet · Michal Valko

Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\widetilde{\mathcal{O}}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularized leader (FTRL) algorithms for this setting: Balanced FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive FTRL which needs $\widetilde{\mathcal{O}}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ realizations without this requirement by progressively adapting the regularization to the observations.

Wed 26 July 19:16 - 19:24 PDT

Semi Bandit dynamics in Congestion Games: Convergence to Nash Equilibrium and No-Regret Guarantees.

Ioannis Panageas · EFSTRATIOS PANTELEIMON SKOULAKIS · Luca Viano · Xiao Wang · Volkan Cevher

In this work, we propose introduce a variant of online stochastic gradient descent and prove it converges to Nash equilibria and simultaneously it has sublinear regret for the class of congestion games in the semi-bandit feedback setting. Our proposed method admits convergence rates depending only polynomially on the number of players and the number of facilities, but not on the size of the action set, which can be exponentially large in terms of the number of facilities. Moreover, the running time of our method has polynomial-time dependence on the implicit description of the game. Our analysis exploits techniques from convex geometry, in particular Caratheodory's theorem and recent advances in non-convex stochastic optimization. This work improves upon and answers an open question from (Cui et al 2022).

Wed 26 July 19:24 - 19:32 PDT

Delving into Noisy Label Detection with Clean Data

Chenglin Yu · Xinsong Ma · Weiwei Liu

A critical element of learning with noisy labels is noisy label detection. Notably, numerous previous works assume that no source of labels can be clean in a noisy label detection context. In this work, we relax this assumption and assume that a small subset of the training data is clean, which enables substantial noisy label detection performance gains. Specifically, we propose a novel framework that leverages clean data by framing the problem of noisy label detection with clean data as a multiple hypothesis testing problem. Moreover, we propose BHN, a simple yet effective approach for noisy label detection that integrates the Benjamini-Hochberg (BH) procedure into deep neural networks. BHN achieves $\textit{state-of-the-art}$ performance and outperforms baselines by $\textbf{28.48}$% in terms of false discovery rate (FDR) and by $\textbf{18.99}$% in terms of F1 on CIFAR-10. Extensive ablation studies further demonstrate the superiority of BHN. Our code is available at https://github.com/ChenglinYu/BHN.

Wed 26 July 19:32 - 19:40 PDT

Robustly Learning a Single Neuron via Sharpness

Puqian Wang · Nikos Zarifis · Ilias Diakonikolas · Jelena Diakonikolas

We study the problem of learning a single neuron with respect to the $L_2^2$-loss in the presence of adversarial label noise. We give an efficient algorithm that, for a broad family of activations including ReLUs, approximates the optimal $L_2^2$-error within a constant factor. Notably, our algorithm succeeds under much milder distributional assumptions compared to prior work. The key ingredient enabling our results is a novel connection to local error bounds from optimization theory.

Wed 26 July 19:40 - 19:48 PDT

Data Feedback Loops: Model-driven Amplification of Dataset Biases

Rohan Taori · Tatsunori Hashimoto

Datasets scraped from the internet have been critical to large-scale machine learning. Yet, its success puts the utility of future internet-derived datasets at potential risk, as model outputs begin to replace human annotations as a source of supervision. In this work, we formalize a system where interactions with one model are recorded as history and scraped as training data in the future. We then analyze its stability over time by tracking changes to a test-time bias statistic (e.g. gender bias of model predictions). We find that the degree of bias amplification is closely linked to whether the model's outputs behave like samples from the training distribution, a behavior which we characterize and define as uniform faithfulness. Experiments in three conditional prediction scenarios -- image classification, visual role-labeling, and language generation -- demonstrate that models that exhibit a sampling-like behavior are more faithful and thus more stable. Based on this insight, we propose an intervention to help mitigate and stabilize unstable feedback systems.

Wed 26 July 19:48 - 19:56 PDT

Towards Reliable Neural Specifications

Chuqin Geng · Van Nham Le · Xiaojie Xu · Zhaoyue Wang · Arie Gurfinkel · Xujie Si

Having reliable specifications is an unavoidable challenge in achieving verifiable correctness, robustness, and interpretability of AI systems. Existing specifications for neural networks are in the paradigm of data as specification. That is, the local neighborhood centering around a reference input is considered to be correct (or robust). While existing specifications contribute to verifying adversarial robustness, a significant problem in many research domains, our empirical study shows that those verified regions are somewhat tight, and thus fail to allow verification of test set inputs, making them impractical for some real-world applications. To this end, we propose a new family of specifications called neural representation as specification. This form of specifications uses the intrinsic information of neural networks, specifically neural activation patterns (NAPs), rather than input data to specify the correctness and/or robustness of neural network predictions. We present a simple statistical approach to mining neural activation patterns. To show the effectiveness of discovered NAPs, we formally verify several important properties, such as various types of misclassifications will never happen for a given NAP, and there is no ambiguity between different NAPs. We show that by using NAP, we can verify a significant region of the input space, while still recalling 84% of the data on MNIST. Moreover, we can push the verifiable bound to 10 times larger on the CIFAR10 benchmark. Thus, we argue that NAPs can potentially be used as a more reliable and extensible specification for neural network verification.

Wed 26 July 19:56 - 20:04 PDT

Do Perceptually Aligned Gradients Imply Robustness?

Roy Ganz · Bahjat Kawar · Michael Elad

Adversarially robust classifiers possess a trait that non-robust models do not - Perceptually Aligned Gradients (PAG). Their gradients with respect to the input align well with human perception. Several works have identified PAG as a byproduct of robust training, but none have considered it as a standalone phenomenon nor studied its own implications. In this work, we focus on this trait and test whether Perceptually Aligned Gradients imply Robustness. To this end, we develop a novel objective to directly promote PAG in training classifiers and examine whether models with such gradients are more robust to adversarial attacks. Extensive experiments on multiple datasets and architectures validate that models with aligned gradients exhibit significant robustness, exposing the surprising bidirectional connection between PAG and robustness. Lastly, we show that better gradient alignment leads to increased robustness and harness this observation to boost the robustness of existing adversarial training techniques.

Wed 26 July 20:04 - 20:12 PDT

ODS: Test-Time Adaptation in the Presence of Open-World Data Shift

Zhi Zhou · Lan-Zhe Guo · Lin-Han Jia · Dingchu Zhang · Yu-Feng Li

Test-time adaptation (TTA) adapts a source model to the distribution shift in testing data without using any source data. There have been plenty of algorithms concentrated on covariate shift in the last decade, i.e., $\mathcal{D}_t(X)$, the distribution of the test data is different from the source data. Nonetheless, in real application scenarios, it is necessary to consider the influence of label distribution shift, i.e., both $\mathcal{D}_t(X)$ and $\mathcal{D}_t(Y)$ are shifted, which has not been sufficiently explored yet. To remedy this, we study a new problem setup, namely, TTA with Open-world Data Shift (AODS). The goal of AODS is simultaneously adapting a model to covariate and label distribution shifts in the test phase. In this paper, we first analyze the relationship between classification error and distribution shifts. Motivated by this, we hence propose a new framework, namely ODS, which decouples the mixed distribution shift and then addresses covariate and label distribution shifts accordingly. We conduct experiments on multiple benchmarks with different types of shifts, and the results demonstrate the superior performance of our method against the state of the arts. Moreover, ODS is suitable for many TTA algorithms.

Wed 26 July 20:12 - 20:20 PDT

Analysis of Error Feedback in Federated Non-Convex Optimization with Biased Compression: Fast Convergence and Partial Participation

Xiaoyun Li · Ping Li

In practical federated learning (FL) systems, the communication cost between the clients and the central server can often be a bottleneck. In this paper, we focus on biased gradient compression in non-convex FL problems. In the classical distributed learning, the method of error feedback (EF) is a common technique to remedy the downsides of biased gradient compression, but the performance of EF in FL still lacks systematic investigation. In this work, we study a compressed FL scheme with error feedback, named Fed-EF, with two variants depending on the global model optimizer. While directly applying biased compression in FL leads to poor convergence, we show that Fed-EF is able to match the convergence rate of the full-precision FL counterpart with a linear speedup w.r.t. the number of clients. Experiments verify that Fed-EF achieves the same performance as the full-precision FL approach, at the substantially reduced communication cost. Moreover, we develop a new analysis of the EF under partial participation (PP), an important scenario in FL. Under PP, the convergence rate of Fed-EF exhibits an extra slow-down factor due to a so-called ``stale error compensation'' effect, which is also justified in our experiments. Our results provide insights on a theoretical limitation of EF, and possible directions for improvements.