Timezone: »
Understanding the performance of machine learning models across diverse data distributions is critically important for reliable applications. Recent empirically works find that there is a strong linear relationship between in-distribution (ID) and out-of-distribution (OOD) performance, but we show that this is not necessarily true if there are subpopulation shifts. In this paper, we empirically show that out-of-distribution performance often has nonlinear correlation with in-distribution performance under subpopulation shifts. To understand this phenomenon, we decompose the model's performance into performance on each subpopulation. We show that there is a "moon shape" correlation (parabolic uptrend curve) between the test performance on the majority subpopulation and the minority subpopulation. This nonlinear correlations hold across model architectures, training durations and hyperparameters, and the imbalance between subpopulations. Moreover, we show that the nonlinearity increases in the presence of spurious correlations in the training data. We provide complementary theoretical and experimental analyses for this interesting phenomenon of nonlinear performance correlation across subpopulations. Finally, we discuss the implications of our findings for ML reliability and fairness.
Author Information
Weixin Liang (Stanford University)
Yining Mao (Zhejiang University)
Yongchan Kwon (Stanford University)
Xinyu Yang (Zhejiang University)
James Zou (Stanford)
More from the Same Authors
-
2021 : MetaDataset: A Dataset of Datasets for Evaluating Distribution Shifts and Training Conflicts »
Weixin Liang · James Zou · Weixin Liang -
2021 : Stateful Performative Gradient Descent »
Zachary Izzo · James Zou · Lexing Ying -
2022 : MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts »
Weixin Liang · Xinyu Yang · James Zou -
2022 : On the nonlinear correlation of ML performance across data subpopulations »
Weixin Liang · Yining Mao · Yongchan Kwon · Xinyu Yang · James Zou -
2022 : Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning »
Weixin Liang · Yuhui Zhang · Yongchan Kwon · Serena Yeung · James Zou -
2023 Poster: Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value »
Yongchan Kwon · James Zou -
2023 Poster: Data-Driven Subgroup Discovery for Linear Regression »
Zachary Izzo · Ruishan Liu · James Zou -
2023 Poster: On the nonlinear correlation of ML performance between data subpopulations »
Weixin Liang · Yining Mao · Yongchan Kwon · Xinyu Yang · James Zou -
2023 Poster: Discover and Cure: Concept-aware Mitigation of Spurious Correlation »
Ying-Xin Wu · Mert Yuksekgonul · Linjun Zhang · James Zou -
2022 : Invited talk #2 James Zou (Title: Machine learning to make clinical trials more efficient and diverse) »
James Zou -
2022 : GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language »
Zhiying Zhu · Weixin Liang · James Zou -
2022 : 7-UP: generating in silico CODEX from a small set of immunofluorescence markers »
James Zou -
2022 : Contributed Talk 2: MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts »
Weixin Liang · Xinyu Yang · James Zou -
2022 Poster: Improving Out-of-Distribution Robustness via Selective Augmentation »
Huaxiu Yao · Yu Wang · Sai Li · Linjun Zhang · Weixin Liang · James Zou · Chelsea Finn -
2022 Spotlight: Improving Out-of-Distribution Robustness via Selective Augmentation »
Huaxiu Yao · Yu Wang · Sai Li · Linjun Zhang · Weixin Liang · James Zou · Chelsea Finn -
2022 Poster: Meaningfully debugging model mistakes using conceptual counterfactual explanations »
Abubakar Abid · Mert Yuksekgonul · James Zou -
2022 Spotlight: Meaningfully debugging model mistakes using conceptual counterfactual explanations »
Abubakar Abid · Mert Yuksekgonul · James Zou -
2021 : Poster Session »
Kishor Datta Gupta · Sebastian Schelter · Till Döhmen · Tony Ginart · Lingjiao Chen · Yongchan Kwon -
2021 : Competition over data: when does data purchase benefit users? »
Yongchan Kwon -
2020 Poster: Principled learning method for Wasserstein distributionally robust optimization with local perturbations »
Yongchan Kwon · Wonyoung Kim · Joong-Ho (Johann) Won · Myunghee Cho Paik -
2019 Poster: Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits »
Martin Zhang · James Zou · David Tse -
2019 Oral: Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits »
Martin Zhang · James Zou · David Tse -
2017 Poster: Estimating the unseen from multiple populations »
Aditi Raghunathan · Greg Valiant · James Zou -
2017 Poster: Learning Latent Space Models with Angular Constraints »
Pengtao Xie · Yuntian Deng · Yi Zhou · Abhimanu Kumar · Yaoliang Yu · James Zou · Eric Xing -
2017 Talk: Learning Latent Space Models with Angular Constraints »
Pengtao Xie · Yuntian Deng · Yi Zhou · Abhimanu Kumar · Yaoliang Yu · James Zou · Eric Xing -
2017 Talk: Estimating the unseen from multiple populations »
Aditi Raghunathan · Greg Valiant · James Zou