Timezone: »
Poster
Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information
Kawin Ethayarajh · Yejin Choi · Swabha Swayamdipta
Estimating the difficulty of a dataset typically involves comparing state-of-the-art models to humans; the bigger the performance gap, the harder the dataset is said to be. However, this comparison provides little understanding of how difficult each instance in a given distribution is, or what attributes make the dataset difficult for a given model. To address these questions, we frame dataset difficulty---w.r.t. a model $\mathcal{V}$---as the lack of $\mathcal{V}$-usable information (Xu et al., 2019), where a lower value indicates a more difficult dataset for $\mathcal{V}$. We further introduce pointwise $\mathcal{V}$-information (PVI) for measuring the difficulty of individual instances w.r.t. a given distribution. While standard evaluation metrics typically only compare different models for the same dataset, $\mathcal{V}$-usable information and PVI also permit the converse: for a given model $\mathcal{V}$, we can compare different datasets, as well as different instances/slices of the same dataset. Furthermore, our framework allows for the interpretability of different input attributes via transformations of the input, which we use to discover annotation artefacts in widely-used NLP benchmarks.
Author Information
Kawin Ethayarajh (Stanford University)
Yejin Choi (University of Washington)
Swabha Swayamdipta (Allen Institute for AI / USC)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Oral: Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information »
Tue. Jul 19th 03:00 -- 03:20 PM Room Room 301 - 303
More from the Same Authors
-
2023 Poster: Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling »
Kolby Nottingham · Prithviraj Ammanabrolu · Alane Suhr · Yejin Choi · Hannaneh Hajishirzi · Sameer Singh · Roy Fox -
2020 Poster: Adversarial Filters of Dataset Biases »
Ronan Le Bras · Swabha Swayamdipta · Chandra Bhagavatula · Rowan Zellers · Matthew Peters · Ashish Sabharwal · Yejin Choi