Timezone: »
Machine learning models often perform poorly under distribution shift. But can we understand how a particular distribution shift will affect a model? We approach this in two parts: (1) explaining the shift itself, and (2) explaining the model's behavior.
First, we train a language model to describe the difference between two distributions. The model produces natural language explanations that allow humans to distinguish random draws from the two distributions. This helps reveal subtle but important shifts that may not be apparent from manual inspection, and can also be used to uncover spurious cues. We use this to identify "shortcuts" that models rely on, and construct a distribution shift that breaks the shortcut and decreases model performance.
Having built tools to understand how the data is shifted, we next investigate whether model explanations (such as Grad-CAM) can be used to predict the behavior of models under distribution shift. Here, the resuts are largely negative. We construct models with specific defects (such as backdoors or spurious cues) that affect out-of-distribution performance, and measure whether model explanations can distinguish these from regular, non-defective models. Detection rates are typically low and in some cases trivial. This underscores the need to improve model explanations if they are to be used as a reliable tool for model debugging.
Author Information
Jacob Steinhardt (UC Berkeley)
More from the Same Authors
-
2022 Poster: Scaling Out-of-Distribution Detection for Real-World Settings »
Dan Hendrycks · Steven Basart · Mantas Mazeika · Andy Zou · joseph kwon · Mohammadreza Mostajabi · Jacob Steinhardt · Dawn Song -
2022 Poster: Predicting Out-of-Distribution Error with the Projection Norm »
Yaodong Yu · Zitong Yang · Alexander Wei · Yi Ma · Jacob Steinhardt -
2022 Poster: More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize »
Alexander Wei · Wei Hu · Jacob Steinhardt -
2022 Spotlight: Scaling Out-of-Distribution Detection for Real-World Settings »
Dan Hendrycks · Steven Basart · Mantas Mazeika · Andy Zou · joseph kwon · Mohammadreza Mostajabi · Jacob Steinhardt · Dawn Song -
2022 Spotlight: Predicting Out-of-Distribution Error with the Projection Norm »
Yaodong Yu · Zitong Yang · Alexander Wei · Yi Ma · Jacob Steinhardt -
2022 Spotlight: More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize »
Alexander Wei · Wei Hu · Jacob Steinhardt -
2022 Poster: Describing Differences between Text Distributions with Natural Language »
Ruiqi Zhong · Charlie Snell · Dan Klein · Jacob Steinhardt -
2022 Spotlight: Describing Differences between Text Distributions with Natural Language »
Ruiqi Zhong · Charlie Snell · Dan Klein · Jacob Steinhardt -
2020 Poster: Rethinking Bias-Variance Trade-off for Generalization of Neural Networks »
Zitong Yang · Yaodong Yu · Chong You · Jacob Steinhardt · Yi Ma