Timezone: »
We present a straightforward statistical test todetect certain violations of the assumption thatthe data are Independent and Identically Dis-tributed (IID). The specific form of violation con-sidered is common across real-world applications:whether the examples are ordered in the datasetsuch that almost adjacent examples tend to havemore similar feature values (e.g. due to distri-butional drift, or attractive interactions betweendatapoints). Based on a k-Nearest Neighbors es-timate, our approach can be used to audit anymultivariate numeric data as well as other datatypes (image, text, audio, etc.) that can be numeri-cally represented, perhaps via model embeddings.Compared with existing methods to detect drift orauto-correlation, our approach is both applicableto more types of data and also able to detect awider variety of IID violations in practice.
Author Information
Jesse Cummings (MIT)
Jonas Mueller (Cleanlab)
Elías Snorrason (Cleanlab)
More from the Same Authors
-
2021 : Multimodal AutoML on Structured Tables with Text Fields »
Xingjian Shi · Jonas Mueller · Nick Erickson · Mu Li · Alex Smola -
2021 : Continuous Doubly Constrained Batch Reinforcement Learning »
Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Pratik Chaudhari · Alex Smola -
2022 : Adaptive Interest for Emphatic Reinforcement Learning »
Martin Klissarov · Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Taesup Kim · Alex Smola -
2022 : Back to the Basics: Revisiting Out-of-Distribution Detection Baselines »
Johnson Kuan · Jonas Mueller -
2023 : How to Cope with Gradual Data Drift? »
Rasool Fakoor · Jonas Mueller · Zachary Lipton · Pratik Chaudhari · Alex Smola -
2023 : Estimating label quality and errors in semantic segmentation data via any model »
Vedang Lad · Jonas Mueller -
2023 : Detecting Errors in Numerical Data via any Regression Model »
Hang Zhou · Jonas Mueller · Mayank Kumar · Jane-Ling Wang · Jing Lei -
2023 : ObjectLab: Automated Diagnosis of Mislabeled Images in Object Detection Data »
Ulyana Tkachenko · Aditya Thyagarajan · Jonas Mueller -
2022 : Model-Agnostic Label Quality Scoring to Detect Real-World Label Errors »
Jonas Mueller -
2021 : Q&A Contributed Talk »
Jonas Mueller -
2021 : Contributed Talk: Multimodal AutoML on Structured Tables with Text Fields »
Jonas Mueller -
2021 Poster: Deep Learning for Functional Data Analysis with Adaptive Basis Layers »
Junwen Yao · Jonas Mueller · Jane-Ling Wang -
2021 Spotlight: Deep Learning for Functional Data Analysis with Adaptive Basis Layers »
Junwen Yao · Jonas Mueller · Jane-Ling Wang -
2020 : 1.2 AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data »
Jonas Mueller -
2020 Poster: Educating Text Autoencoders: Latent Representation Guidance via Denoising »
Tianxiao Shen · Jonas Mueller · Regina Barzilay · Tommi Jaakkola