Timezone: »
Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value
Yongchan Kwon · James Zou
Data valuation is a powerful framework for providing statistical insights into which data are beneficial or detrimental to model training. Many Shapley-based data valuation methods have shown promising results in various downstream tasks, however, they are well known to be computationally challenging as it requires training a large number of models. To address this issue, we propose Data-OOB, a new data valuation method for a bagging model that utilizes the out-of-bag estimate. The proposed method is computationally efficient. Specifically, Data-OOB takes less than $2.25$ hours on a single CPU processor when there are $10^6$ samples to evaluate and the input dimension is $100$. We demonstrate that the proposed method significantly outperforms existing state-of-the-art data valuation methods in identifying mislabeled data, highlighting the potential for applying data values in real-world applications.
Author Information
Yongchan Kwon (Columbia University)
James Zou (Stanford University)
More from the Same Authors
-
2021 : Meaningfully Explaining a Model's Mistakes »
· Abubakar Abid · James Zou -
2021 : Meaningfully Explaining a Model's Mistakes »
Abubakar Abid · James Zou -
2021 : MetaDataset: A Dataset of Datasets for Evaluating Distribution Shifts and Training Conflicts »
Weixin Liang · James Zou · Weixin Liang -
2021 : Have the Cake and Eat It Too? Higher Accuracy and Less Expense when Using Multi-label ML APIs Online »
Lingjiao Chen · James Zou · Matei Zaharia -
2021 : Machine Learning API Shift Assessments: Change is Coming! »
Lingjiao Chen · James Zou · Matei Zaharia -
2021 : Do Humans Trust Advice More if it Comes from AI? An Analysis of Human-AI Interactions »
Kailas Vodrahalli · James Zou -
2022 : On the nonlinear correlation of ML performance across data subpopulations »
Weixin Liang · Yining Mao · Yongchan Kwon · Xinyu Yang · James Zou -
2023 : Improve Model Inference Cost with Image Gridding »
Shreyas Krishnaswamy · Lisa Dunlap · Lingjiao Chen · Matei Zaharia · James Zou · Joseph Gonzalez -
2023 Poster: Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value »
Yongchan Kwon · James Zou -
2023 Poster: Accuracy on the Curve: On the Nonlinear Correlation of ML Performance Between Data Subpopulations »
Weixin Liang · Yining Mao · Yongchan Kwon · Xinyu Yang · James Zou -
2022 : GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language »
Zhiying Zhu · Weixin Liang · James Zou -
2022 : Evaluation of ML in Health/Science »
James Zou -
2022 : Data Sculpting: Interpretable Algorithm for End-to-End Cohort Selection »
Ruishan Liu · James Zou -
2022 : Data Budgeting for Machine Learning »
Weixin Liang · James Zou -
2022 Poster: When and How Mixup Improves Calibration »
Linjun Zhang · Zhun Deng · Kenji Kawaguchi · James Zou -
2022 Poster: Efficient Online ML API Selection for Multi-Label Classification Tasks »
Lingjiao Chen · Matei Zaharia · James Zou -
2022 Poster: Improving Out-of-Distribution Robustness via Selective Augmentation »
Huaxiu Yao · Yu Wang · Sai Li · Linjun Zhang · Weixin Liang · James Zou · Chelsea Finn -
2022 Spotlight: Efficient Online ML API Selection for Multi-Label Classification Tasks »
Lingjiao Chen · Matei Zaharia · James Zou -
2022 Spotlight: Improving Out-of-Distribution Robustness via Selective Augmentation »
Huaxiu Yao · Yu Wang · Sai Li · Linjun Zhang · Weixin Liang · James Zou · Chelsea Finn -
2022 Spotlight: When and How Mixup Improves Calibration »
Linjun Zhang · Zhun Deng · Kenji Kawaguchi · James Zou -
2021 Poster: Improving Generalization in Meta-learning via Task Augmentation »
Huaxiu Yao · Long-Kai Huang · Linjun Zhang · Ying WEI · Li Tian · James Zou · Junzhou Huang · Zhenhui (Jessie) Li -
2021 Spotlight: Improving Generalization in Meta-learning via Task Augmentation »
Huaxiu Yao · Long-Kai Huang · Linjun Zhang · Ying WEI · Li Tian · James Zou · Junzhou Huang · Zhenhui (Jessie) Li -
2021 Poster: How to Learn when Data Reacts to Your Model: Performative Gradient Descent »
Zachary Izzo · Lexing Ying · James Zou -
2021 Spotlight: How to Learn when Data Reacts to Your Model: Performative Gradient Descent »
Zachary Izzo · Lexing Ying · James Zou -
2020 Poster: A Distributional Framework For Data Valuation »
Amirata Ghorbani · Michael Kim · James Zou -
2019 Poster: Concrete Autoencoders: Differentiable Feature Selection and Reconstruction »
Muhammed Fatih Balın · Abubakar Abid · James Zou -
2019 Poster: Discovering Conditionally Salient Features with Statistical Guarantees »
Jaime Roquero Gimenez · James Zou -
2019 Oral: Discovering Conditionally Salient Features with Statistical Guarantees »
Jaime Roquero Gimenez · James Zou -
2019 Oral: Concrete Autoencoders: Differentiable Feature Selection and Reconstruction »
Muhammed Fatih Balın · Abubakar Abid · James Zou -
2019 Poster: Data Shapley: Equitable Valuation of Data for Machine Learning »
Amirata Ghorbani · James Zou -
2019 Oral: Data Shapley: Equitable Valuation of Data for Machine Learning »
Amirata Ghorbani · James Zou -
2018 Poster: CoVeR: Learning Covariate-Specific Vector Representations with Tensor Decompositions »
Kevin Tian · Teng Zhang · James Zou -
2018 Oral: CoVeR: Learning Covariate-Specific Vector Representations with Tensor Decompositions »
Kevin Tian · Teng Zhang · James Zou