Oral
in
Workshop: The Second Workshop on Spurious Correlations, Invariance and Stability
Where Does My Model Underperform?: A Human Evaluation of Slice Discovery Algorithms
Nari Johnson · Ángel Alexander Cabrera · Gregory Plumb · Ameet Talwalkar
Abstract:
A growing number of works propose tools to help stakeholders form hypotheses about the behavior of machine learning models. We focus our study on slice discovery algorithms: automated methods that aim to group together coherent and high-error "slices" (i.e. subsets) of data. While these tools purport to help users identify where (on which subgroups) their model underperforms, there has been little evaluation of whether they help users achieve their proposed goals. We run a controlled user study $(N = 15)$ to evaluate if the slices output by two existing slice discovery algorithms help users form correct hypotheses about an image classification model. Our results provide positive evidence that existing tools provide benefit relative to a naive baseline, and challenge dominant assumptions shared by past work.
Chat is not available.