Poster
in
Workshop: Data-centric Machine Learning Research (DMLR): Datasets for Foundation Models
Spurious Correlations in Machine Learning: A Survey
Wenqian Ye · Guangtao Zheng · Xu Cao · Yunsheng Ma · Aidong Zhang
Machine learning systems are known to be sensitive to spurious correlations between non-essential features of the inputs (e.g., background, texture, and secondary objects) and the corresponding labels. These features and their correlations with the labels are known as ``spurious" because they tend to change with shifts in real-world data distributions, which can negatively impact the model's generalization and robustness. In this paper, we provide a review of this issue, along with a taxonomy of current state-of-the-art methods for addressing spurious correlations in machine learning models. Additionally, we summarize existing datasets, benchmarks, and metrics to aid future research. The paper concludes with a discussion of the recent advancements and future challenges in this field, aiming to provide valuable insights for researchers in the related domains.