Oral
in
Workshop: Shift happens: Crowdsourcing metrics and test datasets beyond ImageNet
Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time
Huaxiu Yao · Caroline Choi · Yoonho Lee · Pang Wei Koh · Chelsea Finn
Distribution shifts occur when the test distribution differs from the training distribution, and can considerably degrade performance of machine learning models deployed in the real world. While recent works have studied robustness to distribution shifts, distribution shifts arising from the passage of time have the additional structure of timestamp metadata. Real-world examples of such shifts are underexplored, and it is unclear whether existing models can leverage trends in past distribution shifts to reliably extrapolate into the future. To address this gap, we curate Wild-Time, a benchmark of 7 datasets that reflect temporal distribution shifts arising in a variety of real-world applications. On these datasets, we systematically benchmark 9 approaches with various inductive biases. Our experiments demonstrate that existing methods are limited in tackling temporal distribution shift: across all settings, we observe an average performance drop of 21\% from in-distribution to out-of-distribution data.