Timezone: »

Understanding the Detrimental Class-level Effects of Data Augmentation
Polina Kirichenko · Mark Ibrahim · Randall Balestriero · Diane Bouchacourt · Ramakrishna Vedantam · Hamed Firooz · Andrew Wilson
Event URL: https://openreview.net/forum?id=dQkeoGnn68 »
Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as $20\%$ on ImageNet. In this work, we present a framework for understanding how DA interacts with class-level learning dynamics. Using higher-quality multi-label annotations on ImageNet, we systematically categorize the affected classes and find that the majority are inherently ambiguous, spuriously correlated, or involve fine-grained distinctions, while DA controls the model's bias towards one of the closely related classes. While many of the previously reported performance drops are explained by multi-label annotations, our analysis of class confusions reveals other sources of accuracy degradation. We show that simple class-conditional augmentation strategies informed by our framework improve performance on the negatively affected classes.

Author Information

Polina Kirichenko (New York University)
Mark Ibrahim (Fundamental AI Research (FAIR), Meta AI)
Randall Balestriero (Rice University)
Diane Bouchacourt (Meta)
Ramakrishna Vedantam (Self Employed)
Hamed Firooz (Facebook)
Andrew Wilson (New York University)

More from the Same Authors