Timezone: »

 
On the Connection between Pre-training Data Diversity and Robustness
Vivek Ramanujan · Vivek Ramanujan · Thao Nguyen · Thao Nguyen · Ludwig Schmidt · Ali Farhadi · Ali Farhadi
Event URL: https://openreview.net/forum?id=Rz8SSW9HI92 »

Our work studies the implications of transfer learning on model behavior beyond accuracy: how does the pre-training distribution affect the downstream robustness of a fine-tuned model? We analyze model robustness using the framework proposed by Taori et al. (2020), which demonstrates that in-distribution and out-of-distribution performances are highly correlated along a robustness linear trend. We explore various interventions that significantly alter the pre-training distribution, including label space, label semantics, and the pre-training dataset itself. In most cases, changes during pre-training have minimal impact on the original linear trend produced by models pre-trained on the full ImageNet dataset. We demonstrate these findings on pre-training distributions constructed from ImageNet and iNaturalist, and fine-tuning data from the iWildCams-WILDS benchmark.

Author Information

Vivek Ramanujan (Department of Computer Science, University of Washington)
Vivek Ramanujan (Department of Computer Science, University of Washington)
Thao Nguyen (University of Washington)
Thao Nguyen (University of Washington)
Ludwig Schmidt (University of Washington)
Ali Farhadi (University of Washington)
Ali Farhadi (University of Washington)

More from the Same Authors