Timezone: »
Data valuation, which quantifies how individual data points contribute to machine learning (ML) model training, is an important question in data-centric ML research and has empowered a broad variety of applications. Popular data value notions such as the Shapley value are computed based on model performance scores trained on different data subsets. Recent studies, however, reveal that stochasticity in neural network training algorithms can adversely affect the consistency of data value rankings. Yet, how to effectively mitigate the impact of the actual perturbation arising from model training, remains an open question.This work introduces TinyMV, a new data value notion that is developed for improved reproducibility against stochasticity stemming from stochastic gradient descent (SGD) or its variants. TinyMV is inspired by a surprising yet consistent pattern of learning stochasticity from SGD: the signal-to-noise ratio (SNR) of a model’s performance change caused by the addition of a training point is maximized on very small datasets (e.g., <=15 data points for CIFAR10). Our experiments demonstrate that TinyMV exhibits state-of-the-art reproducibility and surpasses existing data valuation techniques across a broad range of applications.
Author Information
Jiachen Wang (Princeton University)
Feiyang Kang (Virginia Tech)
Chiyuan Zhang (MIT)
Ruoxi Jia (Virginia Tech)
Prateek Mittal (Princeton University)
More from the Same Authors
-
2023 : Teach GPT To Phish »
Ashwinee Panda · Zhengming Zhang · Yaoqing Yang · Prateek Mittal -
2023 : Characterizing the Optimal $0-1$ Loss for Multi-class Classification with a Test-time Attacker »
Sophie Dai · Wenxin Ding · Arjun Nitin Bhagoji · Daniel Cullina · Ben Zhao · Heather Zheng · Prateek Mittal -
2023 : Forward-INF : Efficient Data Influence Estimation with Duality-based Counterfactual Analysis »
Myeongseob Ko · Feiyang Kang · Weiyan Shi · Ming Jin · Zhou Yu · Ruoxi Jia -
2023 : Data-Centric Defense: Shaping Loss Landscape with Augmentations to Counter Model Inversion »
Si Chen · Feiyang Kang · Nikhil Abhyankar · Ming Jin · Ruoxi Jia -
2023 : A Privacy-Friendly Approach to Data Valuation »
Jiachen Wang · Yuqing Zhu · Yu-Xiang Wang · Ruoxi Jia · Prateek Mittal -
2023 : Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources »
Feiyang Kang · Hoang Anh Just · Anit Kumar Sahu · Ruoxi Jia -
2023 : Data Banzhaf: A Robust Data Valuation Framework for Machine Learning »
Jiachen Wang · Ruoxi Jia -
2023 : Differentially Private Generation of High Fidelity Samples From Diffusion Models »
Vikash Sehwag · Ashwinee Panda · Ashwini Pokle · Xinyu Tang · Saeed Mahloujifar · Mung Chiang · Zico Kolter · Prateek Mittal -
2023 : Visual Adversarial Examples Jailbreak Aligned Large Language Models »
Xiangyu Qi · Kaixuan Huang · Ashwinee Panda · Mengdi Wang · Prateek Mittal -
2023 Poster: Revisiting Data-Free Knowledge Distillation with Poisoned Teachers »
Junyuan Hong · Yi Zeng · Shuyang Yu · Lingjuan Lyu · Ruoxi Jia · Jiayu Zhou -
2023 Poster: MultiRobustBench: Benchmarking Robustness Against Multiple Attacks »
Sophie Dai · Saeed Mahloujifar · Chong Xiang · Vikash Sehwag · Pin-Yu Chen · Prateek Mittal -
2023 Poster: 2D-Shapley: A Framework for Fragmented Data Valuation »
Zhihong Liu · Hoang Anh Just · Xiangyu Chang · Xi Chen · Ruoxi Jia -
2023 Poster: Effectively Using Public Data in Privacy Preserving Machine Learning »
Milad Nasresfahani · Saeed Mahloujifar · Xinyu Tang · Prateek Mittal · Amir Houmansadr -
2023 Poster: Uncovering Adversarial Risks of Test-Time Adaptation »
Tong Wu · Feiran Jia · Xiangyu Qi · Jiachen Wang · Vikash Sehwag · Saeed Mahloujifar · Prateek Mittal -
2022 : Learner Knowledge Levels in Adversarial Machine Learning »
Sophie Dai · Prateek Mittal -
2022 Poster: Neurotoxin: Durable Backdoors in Federated Learning »
Zhengming Zhang · Ashwinee Panda · Linyue Song · Yaoqing Yang · Michael Mahoney · Prateek Mittal · Kannan Ramchandran · Joseph E Gonzalez -
2022 Spotlight: Neurotoxin: Durable Backdoors in Federated Learning »
Zhengming Zhang · Ashwinee Panda · Linyue Song · Yaoqing Yang · Michael Mahoney · Prateek Mittal · Kannan Ramchandran · Joseph E Gonzalez -
2021 Poster: Lower Bounds on Cross-Entropy Loss in the Presence of Test-time Adversaries »
Arjun Nitin Bhagoji · Daniel Cullina · Vikash Sehwag · Prateek Mittal -
2021 Spotlight: Lower Bounds on Cross-Entropy Loss in the Presence of Test-time Adversaries »
Arjun Nitin Bhagoji · Daniel Cullina · Vikash Sehwag · Prateek Mittal -
2019 Poster: Analyzing Federated Learning through an Adversarial Lens »
Arjun Nitin Bhagoji · Supriyo Chakraborty · Prateek Mittal · Seraphin Calo -
2019 Oral: Analyzing Federated Learning through an Adversarial Lens »
Arjun Nitin Bhagoji · Supriyo Chakraborty · Prateek Mittal · Seraphin Calo