Posterior Mismatch Matters: Adversarial Training for Long-Tailed Robustness
Abstract
Adversarial training breaks down in long-tailed settings, exhibiting severe robustness degradation on worst-performing (often tail) classes. We identify a key cause of this failure as a posterior mismatch: coarse-grained absolute labels collapse class posteriors into point estimates, leading to biased class-frequency estimation and an enlarged robust generalization gap, which ultimately amplifies worst-class vulnerability. To address this issue, we propose Posterior-driven Adversarial Training (PAT), which learns a posterior surrogate to provide fine-grained probabilistic supervision for adversarial training, and integrates weight perturbations to encourage a flatter loss landscape. Our theory shows that accurate posterior approximation simultaneously tightens class-frequency estimation error and robust generalization bounds, while a flat weight loss landscape stabilizes sensitivity to posterior approximation errors. Extensive experiments on long-tailed benchmarks confirm that PAT consistently improves robustness, with especially large gains on worst-class.