Timezone: »

Distribution Shift Through the Lens of Explanations
Jacob Steinhardt

Sat Jul 23 07:30 AM -- 08:10 AM (PDT) @

Machine learning models often perform poorly under distribution shift. But can we understand how a particular distribution shift will affect a model? We approach this in two parts: (1) explaining the shift itself, and (2) explaining the model's behavior.

First, we train a language model to describe the difference between two distributions. The model produces natural language explanations that allow humans to distinguish random draws from the two distributions. This helps reveal subtle but important shifts that may not be apparent from manual inspection, and can also be used to uncover spurious cues. We use this to identify "shortcuts" that models rely on, and construct a distribution shift that breaks the shortcut and decreases model performance.

Having built tools to understand how the data is shifted, we next investigate whether model explanations (such as Grad-CAM) can be used to predict the behavior of models under distribution shift. Here, the resuts are largely negative. We construct models with specific defects (such as backdoors or spurious cues) that affect out-of-distribution performance, and measure whether model explanations can distinguish these from regular, non-defective models. Detection rates are typically low and in some cases trivial. This underscores the need to improve model explanations if they are to be used as a reliable tool for model debugging.

Author Information

Jacob Steinhardt (UC Berkeley)

More from the Same Authors