Skip to yearly menu bar Skip to main content


Poster

Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies

Brian Bartoldson · James Diffenderfer · Konstantinos Parasyris · Bhavya Kailkhura


Abstract: Neural network input can be subtly modified to produce undesirable behaviors ranging from image misclassification to guardrail-failure in generative models. Training on adversarially perturbed inputs can improve model robustness to such failures and is standard practice, but such training appears too costly to be a general solution: over $10^{21}$ FLOPs, almost a tenth of the Llama 7B pretraining cost, is needed just to adversarially train a CIFAR10 model past 71\% robustness. Here, we reexamine this intractability of the robustness problem that is suggested by its state on CIFAR10. Initially, we study three common adversarial training choices---model size, dataset size, and synthetic data quality---by fitting the first scaling laws that model their effects to hundreds of CIFAR10 adversarial training results.Via our scaling laws, we obtain compute-efficient setups that match the prior SOTA with 20\% (55\%) fewer training (inference) FLOPs, and setups that surpass the prior SOTA by 3\% (AutoAttack accuracy).However, our new CIFAR10 robustness SOTA is just 74\% (AutoAttack accuracy). Further, our scaling laws predict robustness slowly climbs then asymptotes at 90\%: i.e., dwarfing our SOTA by scaling is impractical, and perfect robustness is impossible. To better understand this limit, we assess human performance on the data generated by AutoAttack to fool our SOTA model.While such attacks are small ($\ell_p$-norm bounded), we nonetheless find that even humans misclassify at least $\sim$$10$\% of attacked data, consistent with our asymptotes. Humans err when attacks produce *invalid adversarial data*, which depicts an instance of the wrong class or an unclear object. We take first steps towards addressing invalid data, including removing it from benchmarking to clarify the true state of progress towards human robustness.

Live content is unavailable. Log in and register to view live content