Timezone: »
Dataset replication is a useful tool for assessing whether improvements in test accuracy on a specific benchmark correspond to improvements in models' ability to generalize reliably. In this work, we present unintuitive yet significant ways in which standard approaches to dataset replication introduce statistical bias, skewing the resulting observations. We study ImageNet-v2, a replication of the ImageNet dataset on which models exhibit a significant (11-14%) drop in accuracy, even after controlling for selection frequency, a human-in-the-loop measure of data quality. We show that after remeasuring selection frequencies and correcting for statistical bias, only an estimated 3.6% of the original 11.7% accuracy drop remains unaccounted for. We conclude with concrete recommendations for recognizing and avoiding bias in dataset replication. Code for our study is publicly available: https://git.io/data-rep-analysis.
Author Information
Logan Engstrom (MIT)
Andrew Ilyas (Massachusetts Institute of Technology)
Shibani Santurkar (MIT)
Dimitris Tsipras (MIT)
Jacob Steinhardt (University of California, Berkeley)
Aleksander Madry (MIT)
More from the Same Authors
-
2020 Poster: From ImageNet to Image Classification: Contextualizing Progress on Benchmarks »
Dimitris Tsipras · Shibani Santurkar · Logan Engstrom · Andrew Ilyas · Aleksander Madry -
2019 Workshop: Identifying and Understanding Deep Learning Phenomena »
Hanie Sedghi · Samy Bengio · Kenji Hata · Aleksander Madry · Ari Morcos · Behnam Neyshabur · Maithra Raghu · Ali Rahimi · Ludwig Schmidt · Ying Xiao -
2019 Workshop: Workshop on the Security and Privacy of Machine Learning »
Nicolas Papernot · Florian Tramer · Bo Li · Dan Boneh · David Evans · Somesh Jha · Percy Liang · Patrick McDaniel · Jacob Steinhardt · Dawn Song -
2019 Poster: Sever: A Robust Meta-Algorithm for Stochastic Optimization »
Ilias Diakonikolas · Gautam Kamath · Daniel Kane · Jerry Li · Jacob Steinhardt · Alistair Stewart -
2019 Poster: Exploring the Landscape of Spatial Robustness »
Logan Engstrom · Brandon Tran · Dimitris Tsipras · Ludwig Schmidt · Aleksander Madry -
2019 Oral: Sever: A Robust Meta-Algorithm for Stochastic Optimization »
Ilias Diakonikolas · Gautam Kamath · Daniel Kane · Jerry Li · Jacob Steinhardt · Alistair Stewart -
2019 Oral: Exploring the Landscape of Spatial Robustness »
Logan Engstrom · Brandon Tran · Dimitris Tsipras · Ludwig Schmidt · Aleksander Madry -
2018 Poster: On the Limitations of First-Order Approximation in GAN Dynamics »
Jerry Li · Aleksander Madry · John Peebles · Ludwig Schmidt -
2018 Oral: On the Limitations of First-Order Approximation in GAN Dynamics »
Jerry Li · Aleksander Madry · John Peebles · Ludwig Schmidt -
2018 Poster: Black-box Adversarial Attacks with Limited Queries and Information »
Andrew Ilyas · Logan Engstrom · Anish Athalye · Jessy Lin -
2018 Oral: Black-box Adversarial Attacks with Limited Queries and Information »
Andrew Ilyas · Logan Engstrom · Anish Athalye · Jessy Lin -
2018 Poster: Synthesizing Robust Adversarial Examples »
Anish Athalye · Logan Engstrom · Andrew Ilyas · Kevin Kwok -
2018 Poster: A Classification-Based Study of Covariate Shift in GAN Distributions »
Shibani Santurkar · Ludwig Schmidt · Aleksander Madry -
2018 Oral: Synthesizing Robust Adversarial Examples »
Anish Athalye · Logan Engstrom · Andrew Ilyas · Kevin Kwok -
2018 Oral: A Classification-Based Study of Covariate Shift in GAN Distributions »
Shibani Santurkar · Ludwig Schmidt · Aleksander Madry -
2017 Poster: Deep Tensor Convolution on Multicores »
David Budden · Alexander Matveev · Shibani Santurkar · Shraman Ray Chaudhuri · Nir Shavit -
2017 Talk: Deep Tensor Convolution on Multicores »
David Budden · Alexander Matveev · Shibani Santurkar · Shraman Ray Chaudhuri · Nir Shavit