Poster Wed, Jul 8, 2026 • 2:30 PM – 4:15 PM KST Coex: HALL A

On the Fragility of Data Attribution When Learning Is Distributed

Xian Gao ⋅ Bo Hui ⋅ MIN-TE SUN ⋅ Wei-Shinn Ku

Abstract

Data attribution has become a core primitive for pricing, auditing, and governing machine learning pipelines, yet current attribution methods implicitly assume that attribution value faithfully reflects participants' contributions. We show that this assumption can fail: a single participant in a standard distributed training workflow can substantially inflate its measured attribution value while keeping global utility intact. Our attribution-first attack uses a latent optimization procedure that injects small, utility-preserving synthetic batches to exploit non-IID label coverage and evaluator sensitivities. Across datasets, models, and multiple marginal-utility evaluators, the attack consistently raises the adversary’s attribution value and reshapes the relative attribution structure among benign clients without degrading accuracy or triggering geometry-based defenses. These results demonstrate that attribution itself constitutes a new attack surface and motivate the development of attribution-robust and incentive-compatible scoring mechanisms.