Timezone: »

Classifiers Should Do Well Even on Their Worst Classes
Julian Bitterwolf · Alexander Meinke · Valentyn Boreiko · Matthias Hein
Event URL: https://openreview.net/forum?id=QxIXCVYJ2WP »

The performance of a vision classifier on a given test set is usually measured by its accuracy. For reliable machine learning systems, however, it is important to avoid the existence of areas of the input space where they fail severely. To reflect this, we argue, that a single number does not provide a complete enough picture even for a fixed test set, as there might be particular classes or subtasks where a model that is generally accurate performs unexpectedly poorly. Without using new data, we motivate and establish a wide selection of interesting worst-case performance metrics which can be evaluated besides accuracy on a given test set. Some of these metrics can be extended when a grouping of the original classes into superclasses is available, indicating if the model is exceptionally bad at handling inputs from one superclass.

Author Information

Julian Bitterwolf (University of Tübingen)
Alexander Meinke (University of Tübingen)
Valentyn Boreiko (Eberhard-Karls-Universität Tübingen)
Matthias Hein (University of Tübingen)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors