Poster
Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization
Simone Bombari · Marco Mondelli
East Exhibition Hall A-B #E-2706
Machine learning models often learn misleading patterns from their training data. For example, a neural network might think that a photo of a person swimming in the ocean is a boat, simply because the network (when training) used to see water only as the background of boats. These patterns are known as spurious correlations, and can negatively impact the fairness, accuracy, and reliability of AI models. In this study, we theoretically explore why and how these misleading correlations happen. We look at the mathematically tractable setting of linear regression, and we provide insights that could be statistically relevant also for more complex models. For example, we attempt to formalize the known fact that neural networks prefer to learn "easy" patterns: intuitively, a blue background is easier to recognize than a heterogeneously-shaped object (e.g., a boat). Furthermore we see that, unfortunately, there is sometimes a trade-off between maximizing the accuracy of a model, and minimizing the amount of spurious correlations. Finally, we extend our theoretical results to the case where models become larger and larger, as in the setting of modernly used deep neural networks.
Live content is unavailable. Log in and register to view live content