Skip to yearly menu bar Skip to main content


Oral
in
Workshop: The Second Workshop on Spurious Correlations, Invariance and Stability

Where Does My Model Underperform?: A Human Evaluation of Slice Discovery Algorithms

Nari Johnson · Ángel Alexander Cabrera · Gregory Plumb · Ameet Talwalkar


Abstract: A growing number of works propose tools to help stakeholders form hypotheses about the behavior of machine learning models. We focus our study on slice discovery algorithms: automated methods that aim to group together coherent and high-error "slices" (i.e. subsets) of data. While these tools purport to help users identify where (on which subgroups) their model underperforms, there has been little evaluation of whether they help users achieve their proposed goals. We run a controlled user study $(N = 15)$ to evaluate if the slices output by two existing slice discovery algorithms help users form correct hypotheses about an image classification model. Our results provide positive evidence that existing tools provide benefit relative to a naive baseline, and challenge dominant assumptions shared by past work.

Chat is not available.