Skip to yearly menu bar Skip to main content


Poster
in
Workshop: DMLR Workshop: Data-centric Machine Learning Research

Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?

Megan Richards · Diane Bouchacourt · Mark Ibrahim · Polina Kirichenko


Abstract:

Recent advances in foundation models, trained on orders of magnitude more data, have begun to saturate standard generalization benchmarks, which tend to focus on predefined or synthetic alterations of images. Despite this progress, even today’s best models are brittle in practice. Consequently, we propose studying generalization across geography as a more realistic measure of progress using two datasets of objects from households across the globe. We conduct an extensive empirical evaluation of nearly 100 vision models, including the most recent foundation models. We examine both the rate of progress and disparities in performance not revealed by average accuracy. We first identify a progress gap between standard benchmarks and real-world, geographical shifts: progress on ImageNet results in up to 2.5x more progress on standard generalization benchmarks than real-world distribution shifts. Second, we study model generalization across geographies by measuring the disparities in performance across regions, a more fine-grained measure of real world generalization. We observe all models have large geographic disparities, even foundation CLIP models, with differences of 7% - 20% in accuracy between regions. Counter to modern intuition, we discover progress on standard benchmarks fails to improve geographic disparities and in many cases exacerbates them: geographic disparities between the least performant models and today's best models have more than tripled. Our results suggest scaling alone is insufficient for consistent robustness to real-world distribution shifts. We highlight the need for more representative benchmarking and more precise measures of generalization progress.

Chat is not available.