From ImageNet to Image Classification: Contextualizing Progress on Benchmarks

Dimitris Tsipras · Shibani Santurkar · Logan Engstrom · Andrew Ilyas · Aleksander Madry

Keywords: [ Computer Vision ] [ Crowdsourcing ] [ Other ] [ Deep Learning - General ]

[ Abstract ]
Thu 16 Jul 6 a.m. PDT — 6:45 a.m. PDT
Thu 16 Jul 5 p.m. PDT — 5:45 p.m. PDT


Building rich machine learning datasets in a scalable manner often necessitates a crowd-sourced data collection pipeline. In this work, we use human studies to investigate the consequences of employing such a pipeline, focusing on the popular ImageNet dataset. We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset---including the introduction of biases that state-of-the-art models exploit. Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for. Finally, our findings emphasize the need to augment our current model training and evaluation toolkit to take such misalignment into account.

Chat is not available.