Timezone: »
Data valuation aims to quantify the usefulness of individual data sources in training machine learning (ML) models, and is a critical aspect of data-centric ML research. However, data valuation faces significant yet frequently overlooked privacy challenges despite its importance. This paper studies these privacy challenges with a focus on KNN-Shapley, one of the most practical data valuation methods nowadays. We first emphasize the inherent privacy risks of KNN-Shapley, and demonstrate the significant technical difficulties in adapting KNN-Shapley to accommodate differential privacy (DP). To overcome these challenges, we introduce \emph{TKNN-Shapley}, a refined variant of KNN-Shapley that is privacy-friendly, allowing for straightforward modifications to incorporate DP guarantee (\emph{DP-}TKNN-Shapley). We show that DP-TKNN-Shapley has several advantages and offers a superior privacy-utility tradeoff compared to naively privatized KNN-Shapley in discerning data quality. Moreover, even non-private TKNN-Shapley achieves comparable performance as KNN-Shapley. Overall, our findings suggest that TKNN-Shapley is a promising alternative to KNN-Shapley, particularly for real-world applications involving sensitive data. Full version of the paper is attached in Appendix.
Author Information
Jiachen Wang (Princeton University)
Yuqing Zhu (UC Santa Barbara)
Yu-Xiang Wang (UC Santa Barbara / Amazon)
Ruoxi Jia (Virginia Tech)
Prateek Mittal (Princeton University)
More from the Same Authors
-
2021 : Optimal Accounting of Differential Privacy via Characteristic Function »
Yuqing Zhu · Jinshuo Dong · Yu-Xiang Wang -
2022 : Optimal Dynamic Regret in LQR Control »
Dheeraj Baby · Yu-Xiang Wang -
2023 : Teach GPT To Phish »
Ashwinee Panda · Zhengming Zhang · Yaoqing Yang · Prateek Mittal -
2023 : Characterizing the Optimal $0-1$ Loss for Multi-class Classification with a Test-time Attacker »
Sophie Dai · Wenxin Ding · Arjun Nitin Bhagoji · Daniel Cullina · Ben Zhao · Heather Zheng · Prateek Mittal -
2023 : Forward-INF : Efficient Data Influence Estimation with Duality-based Counterfactual Analysis »
Myeongseob Ko · Feiyang Kang · Weiyan Shi · Ming Jin · Zhou Yu · Ruoxi Jia -
2023 : Data-Centric Defense: Shaping Loss Landscape with Augmentations to Counter Model Inversion »
Si Chen · Feiyang Kang · Nikhil Abhyankar · Ming Jin · Ruoxi Jia -
2023 : Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources »
Feiyang Kang · Hoang Anh Just · Anit Kumar Sahu · Ruoxi Jia -
2023 : Data Banzhaf: A Robust Data Valuation Framework for Machine Learning »
Jiachen Wang · Ruoxi Jia -
2023 : On the Reproducibility of Data Valuation under Learning Stochasticity »
Jiachen Wang · Feiyang Kang · Chiyuan Zhang · Ruoxi Jia · Prateek Mittal -
2023 : Why Quantization Improves Generalization: NTK of Binary Weight Neural Network »
Kaiqi Zhang · Ming Yin · Yu-Xiang Wang -
2023 : Generative Autoencoders as Watermark Attackers: Analyses of Vulnerabilities and Threats »
Xuandong Zhao · Kexun Zhang · Yu-Xiang Wang · Lei Li -
2023 : Provable Robust Watermarking for AI-Generated Text »
Xuandong Zhao · Prabhanjan Ananth · Lei Li · Yu-Xiang Wang -
2023 : Differentially Private Generation of High Fidelity Samples From Diffusion Models »
Vikash Sehwag · Ashwinee Panda · Ashwini Pokle · Xinyu Tang · Saeed Mahloujifar · Mung Chiang · Zico Kolter · Prateek Mittal -
2023 : Visual Adversarial Examples Jailbreak Aligned Large Language Models »
Xiangyu Qi · Kaixuan Huang · Ashwinee Panda · Mengdi Wang · Prateek Mittal -
2023 Poster: Revisiting Data-Free Knowledge Distillation with Poisoned Teachers »
Junyuan Hong · Yi Zeng · Shuyang Yu · Lingjuan Lyu · Ruoxi Jia · Jiayu Zhou -
2023 Poster: MultiRobustBench: Benchmarking Robustness Against Multiple Attacks »
Sophie Dai · Saeed Mahloujifar · Chong Xiang · Vikash Sehwag · Pin-Yu Chen · Prateek Mittal -
2023 Poster: Offline Reinforcement Learning with Closed-Form Policy Improvement Operators »
Jiachen Li · Edwin Zhang · Ming Yin · Jerry Bai · Yu-Xiang Wang · William Wang -
2023 Poster: 2D-Shapley: A Framework for Fragmented Data Valuation »
Zhihong Liu · Hoang Anh Just · Xiangyu Chang · Xi Chen · Ruoxi Jia -
2023 Poster: Effectively Using Public Data in Privacy Preserving Machine Learning »
Milad Nasresfahani · Saeed Mahloujifar · Xinyu Tang · Prateek Mittal · Amir Houmansadr -
2023 Poster: Protecting Language Generation Models via Invisible Watermarking »
Xuandong Zhao · Yu-Xiang Wang · Lei Li -
2023 Poster: Differentially Private Optimization on Large Model at Small Cost »
Zhiqi Bu · Yu-Xiang Wang · Sheng Zha · George Karypis -
2023 Poster: Non-stationary Reinforcement Learning under General Function Approximation »
Songtao Feng · Ming Yin · Ruiquan Huang · Yu-Xiang Wang · Jing Yang · Yingbin LIANG -
2023 Poster: Global Optimization with Parametric Function Approximation »
Chong Liu · Yu-Xiang Wang -
2023 Poster: Uncovering Adversarial Risks of Test-Time Adaptation »
Tong Wu · Feiran Jia · Xiangyu Qi · Jiachen Wang · Vikash Sehwag · Saeed Mahloujifar · Prateek Mittal -
2022 : Learner Knowledge Levels in Adversarial Machine Learning »
Sophie Dai · Prateek Mittal -
2022 Poster: Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost »
Dan Qiao · Ming Yin · Ming Min · Yu-Xiang Wang -
2022 Spotlight: Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost »
Dan Qiao · Ming Yin · Ming Min · Yu-Xiang Wang -
2022 Poster: Neurotoxin: Durable Backdoors in Federated Learning »
Zhengming Zhang · Ashwinee Panda · Linyue Song · Yaoqing Yang · Michael Mahoney · Prateek Mittal · Kannan Ramchandran · Joseph E Gonzalez -
2022 Spotlight: Neurotoxin: Durable Backdoors in Federated Learning »
Zhengming Zhang · Ashwinee Panda · Linyue Song · Yaoqing Yang · Michael Mahoney · Prateek Mittal · Kannan Ramchandran · Joseph E Gonzalez -
2021 Poster: Lower Bounds on Cross-Entropy Loss in the Presence of Test-time Adversaries »
Arjun Nitin Bhagoji · Daniel Cullina · Vikash Sehwag · Prateek Mittal -
2021 Spotlight: Lower Bounds on Cross-Entropy Loss in the Presence of Test-time Adversaries »
Arjun Nitin Bhagoji · Daniel Cullina · Vikash Sehwag · Prateek Mittal -
2019 Poster: Analyzing Federated Learning through an Adversarial Lens »
Arjun Nitin Bhagoji · Supriyo Chakraborty · Prateek Mittal · Seraphin Calo -
2019 Oral: Analyzing Federated Learning through an Adversarial Lens »
Arjun Nitin Bhagoji · Supriyo Chakraborty · Prateek Mittal · Seraphin Calo -
2019 Poster: Poission Subsampled R\'enyi Differential Privacy »
Yuqing Zhu · Yu-Xiang Wang -
2019 Oral: Poission Subsampled R\'enyi Differential Privacy »
Yuqing Zhu · Yu-Xiang Wang