Timezone: »
Test-Time Adaptation (TTA) has recently gained significant attention as a new paradigm for tackling distribution shifts. Despite the sheer number of existing methods, the inconsistent experimental conditions and lack of standardization in prior literature make it difficult to measure their actual efficacies and progress. To address this issue, we present a large-scale open-sourced Test-Time Adaptation Benchmark, dubbed TTAB, which includes nine state-of-the-art algorithms, a diverse array of distribution shifts, and two comprehensive evaluation protocols. Through extensive experiments, we identify three common pitfalls in prior efforts: (i) choosing appropriate hyper-parameter, especially for model selection, is exceedingly difficult due to online batch dependency; (ii) the effectiveness of TTA varies greatly depending on the quality of the model being adapted; (iii) even under optimal algorithmic conditions, existing methods still systematically struggle with certain types of distribution shifts. Our findings suggest that future research in the field should be more transparent about their experimental conditions, ensure rigorous evaluations on a broader set of models and shifts, and re-examine the assumptions underlying the potential success of TTA for practical applications.
Author Information
Hao Zhao (EPFL)
Yuejiang Liu (EPFL)
Alexandre Alahi (EPFL)
Tao Lin (Westlake University)
More from the Same Authors
-
2022 : Paper 19 : Few-Shot Style Transfer for Deep Motion Forecasting »
Yuejiang Liu · Alexandre Alahi · Hitesh Arora -
2023 Poster: FedBR: Improving Federated Learning on Heterogeneous Data via Local Learning Bias Reduction »
Yongxin Guo · Xiaoying Tang · Tao Lin -
2023 Poster: Revisiting Weighted Aggregation in Federated Learning with Neural Networks »
Zexi Li · Tao Lin · Xinyi Shang · Chao Wu -
2019 : Sven Kreiss: "Compositionality, Confidence and Crowd Modeling for Self-Driving Cars" »
Sven Kreiss · Alexandre Alahi