Timezone: »

 
Where Does My Model Underperform?: A Human Evaluation of Slice Discovery Algorithms
Nari Johnson · Ángel Alexander Cabrera · Gregory Plumb · Ameet Talwalkar
Event URL: https://openreview.net/forum?id=HnyYGxRliS »
A growing number of works propose tools to help stakeholders form hypotheses about the behavior of machine learning models. We focus our study on slice discovery algorithms: automated methods that aim to group together coherent and high-error "slices" (i.e. subsets) of data. While these tools purport to help users identify where (on which subgroups) their model underperforms, there has been little evaluation of whether they help users achieve their proposed goals. We run a controlled user study $(N = 15)$ to evaluate if the slices output by two existing slice discovery algorithms help users form correct hypotheses about an image classification model. Our results provide positive evidence that existing tools provide benefit relative to a naive baseline, and challenge dominant assumptions shared by past work.

Author Information

Nari Johnson (Carnegie Mellon University)
Ángel Alexander Cabrera (Carnegie Mellon University)
Gregory Plumb (Carnegie Mellon University)
Ameet Talwalkar (Carnegie Mellon University)

More from the Same Authors