Timezone: »
The labor-intensive annotation process of semantic segmentation datasets is often prone to errors, since humans struggle to label every pixel correctly. We study algorithms to automatically detect such annotation errors, in particular methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled. This helps prioritize what data to review in order to ensure a high-quality training/evaluation dataset, which is critical in sensitive applications such as medical imaging and autonomous vehicles. Widely applicable, our label quality scores rely on probabilistic predictions from a trained segmentation model -- any model architecture and training procedure can be utilized. Here we study 7 different label quality scoring methods used in conjunction with either a DeepLabV3+ or FPN segmentation model to detect annotation errors in a version of the SYNTHIA dataset. Precision-recall evaluations reveal a score -- the soft-minimum of the model-estimated likelihoods of each pixel's annotated class -- that is particularly effective to identify images that are mislabeled, across multiple types of annotation error.
Author Information
Vedang Lad (Massachusetts Institute of Technology)
Jonas Mueller (Cleanlab)
More from the Same Authors
-
2021 : Multimodal AutoML on Structured Tables with Text Fields »
Xingjian Shi · Jonas Mueller · Nick Erickson · Mu Li · Alex Smola -
2021 : Continuous Doubly Constrained Batch Reinforcement Learning »
Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Pratik Chaudhari · Alex Smola -
2022 : Adaptive Interest for Emphatic Reinforcement Learning »
Martin Klissarov · Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Taesup Kim · Alex Smola -
2022 : Back to the Basics: Revisiting Out-of-Distribution Detection Baselines »
Johnson Kuan · Jonas Mueller -
2023 : How to Cope with Gradual Data Drift? »
Rasool Fakoor · Jonas Mueller · Zachary Lipton · Pratik Chaudhari · Alex Smola -
2023 : Detecting Dataset Drift and Non-IID Sampling via k-Nearest Neighbors »
Jesse Cummings · Jonas Mueller · ElĂas Snorrason -
2023 : Detecting Errors in Numerical Data via any Regression Model »
Hang Zhou · Jonas Mueller · Mayank Kumar · Jane-Ling Wang · Jing Lei -
2023 : ObjectLab: Automated Diagnosis of Mislabeled Images in Object Detection Data »
Ulyana Tkachenko · Aditya Thyagarajan · Jonas Mueller -
2022 : Model-Agnostic Label Quality Scoring to Detect Real-World Label Errors »
Jonas Mueller -
2021 : Q&A Contributed Talk »
Jonas Mueller -
2021 : Contributed Talk: Multimodal AutoML on Structured Tables with Text Fields »
Jonas Mueller -
2021 Poster: Deep Learning for Functional Data Analysis with Adaptive Basis Layers »
Junwen Yao · Jonas Mueller · Jane-Ling Wang -
2021 Spotlight: Deep Learning for Functional Data Analysis with Adaptive Basis Layers »
Junwen Yao · Jonas Mueller · Jane-Ling Wang -
2020 : 1.2 AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data »
Jonas Mueller -
2020 Poster: Educating Text Autoencoders: Latent Representation Guidance via Denoising »
Tianxiao Shen · Jonas Mueller · Regina Barzilay · Tommi Jaakkola