On the Clean Generalization and Robust Overfitting in Adversarial Training from Two Theoretical Views: Representation Complexity and Training Dynamics
Abstract
Lay Summary
Adversarial training, similar to standard deep learning, enables deep nets to generalize well to unseen clean data. However, even though adversarial training can reduce training errors, a significant gap in robust generalization remains. We call this the Clean Generalization and Robust Overfitting (CGRO) phenomenon. In this study, we explore CGRO from two perspectives: model complexity and training dynamics. We show that a simple neural network can achieve CGRO through robust memorization, while a fully robust classifier requires much more complex representations. We also analyze the training process of a convolutional network and identify a three-stage phase transition during learning, which leads to robust memorization and explains the CGRO effect. Our theoretical analysis is supported by experiments on real-world image recognition datasets.