Timezone: »
The performance of a vision classifier on a given test set is usually measured by its accuracy. For reliable machine learning systems, however, it is important to avoid the existence of areas of the input space where they fail severely. To reflect this, we argue, that a single number does not provide a complete enough picture even for a fixed test set, as there might be particular classes or subtasks where a model that is generally accurate performs unexpectedly poorly. Without using new data, we motivate and establish a wide selection of interesting worst-case performance metrics which can be evaluated besides accuracy on a given test set. Some of these metrics can be extended when a grouping of the original classes into superclasses is available, indicating if the model is exceptionally bad at handling inputs from one superclass.
Author Information
Julian Bitterwolf (University of Tübingen)
Alexander Meinke (University of Tübingen)
Valentyn Boreiko (Eberhard-Karls-Universität Tübingen)
Matthias Hein (University of Tübingen)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 : Classifiers Should Do Well Even on Their Worst Classes »
Dates n/a. Room
More from the Same Authors
-
2022 : Provably Adversarially Robust Detection of Out-of-Distribution Data (Almost) for Free »
Alexander Meinke · Julian Bitterwolf · Matthias Hein -
2022 : Sound randomized smoothing in floating-point arithmetics »
Václav Voráček · Matthias Hein -
2022 : Sound randomized smoothing in floating-point arithmetics »
Václav Voráček · Matthias Hein -
2022 : Lost in Translation: Modern Image Classifiers still degrade even under simple Translations »
Leander Kurscheidt · Matthias Hein -
2023 Poster: In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation »
Julian Bitterwolf · Maximilian Müller · Matthias Hein -
2023 Poster: A modern look at the relationship between sharpness and generalization »
Maksym Andriushchenko · Francesco Croce · Maximilian Müller · Matthias Hein · Nicolas Flammarion -
2023 Poster: Improving $\ell_1$-Certified Robustness via Randomized Smoothing by Leveraging Box Constraints »
Václav Voráček · Matthias Hein -
2022 : Closing remarks »
Evgenia Rusak · Roland S. Zimmermann · Julian Bitterwolf · Steffen Schneider -
2022 : Lost in Translation: Modern Image Classifiers still degrade even under simple Translations »
Leander Kurscheidt · Matthias Hein -
2022 : On the interplay of adversarial robustness and architecture components: patches, convolution and attention »
Francesco Croce · Matthias Hein -
2022 Workshop: Shift happens: Crowdsourcing metrics and test datasets beyond ImageNet »
Roland S. Zimmermann · Julian Bitterwolf · Evgenia Rusak · Steffen Schneider · Matthias Bethge · Wieland Brendel · Matthias Hein -
2022 : Introduction and opening remarks »
Julian Bitterwolf · Roland S. Zimmermann · Steffen Schneider · Evgenia Rusak -
2022 Poster: Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate a Combination of the Same Core Quantities »
Julian Bitterwolf · Alexander Meinke · Maximilian Augustin · Matthias Hein -
2022 Spotlight: Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate a Combination of the Same Core Quantities »
Julian Bitterwolf · Alexander Meinke · Maximilian Augustin · Matthias Hein -
2022 Poster: Adversarial Robustness against Multiple and Single $l_p$-Threat Models via Quick Fine-Tuning of Robust Classifiers »
Francesco Croce · Matthias Hein -
2022 Poster: Provably Adversarially Robust Nearest Prototype Classifiers »
Václav Voráček · Matthias Hein -
2022 Poster: Evaluating the Adversarial Robustness of Adaptive Test-time Defenses »
Francesco Croce · Sven Gowal · Thomas Brunner · Evan Shelhamer · Matthias Hein · Taylan Cemgil -
2022 Spotlight: Adversarial Robustness against Multiple and Single $l_p$-Threat Models via Quick Fine-Tuning of Robust Classifiers »
Francesco Croce · Matthias Hein -
2022 Spotlight: Evaluating the Adversarial Robustness of Adaptive Test-time Defenses »
Francesco Croce · Sven Gowal · Thomas Brunner · Evan Shelhamer · Matthias Hein · Taylan Cemgil -
2022 Spotlight: Provably Adversarially Robust Nearest Prototype Classifiers »
Václav Voráček · Matthias Hein -
2021 : Discussion Panel #1 »
Hang Su · Matthias Hein · Liwei Wang · Sven Gowal · Jan Hendrik Metzen · Henry Liu · Yisen Wang -
2021 : Invited Talk #3 »
Matthias Hein -
2021 : Provably Robust Detection of Out-of-distribution Data (almost) for free »
Alexander Meinke -
2021 Poster: Mind the Box: $l_1$-APGD for Sparse Adversarial Attacks on Image Classifiers »
Francesco Croce · Matthias Hein -
2021 Spotlight: Mind the Box: $l_1$-APGD for Sparse Adversarial Attacks on Image Classifiers »
Francesco Croce · Matthias Hein -
2020 : Keynote #1 Matthias Hein »
Matthias Hein -
2020 Poster: Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack »
Francesco Croce · Matthias Hein -
2020 Poster: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks »
Francesco Croce · Matthias Hein -
2020 Poster: Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks »
Agustinus Kristiadi · Matthias Hein · Philipp Hennig -
2020 Poster: Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks »
David Stutz · Matthias Hein · Bernt Schiele -
2019 Poster: Spectral Clustering of Signed Graphs via Matrix Power Means »
Pedro Mercado · Francesco Tudisco · Matthias Hein -
2019 Oral: Spectral Clustering of Signed Graphs via Matrix Power Means »
Pedro Mercado · Francesco Tudisco · Matthias Hein