On the Fragility of Data Attribution When Learning Is Distributed
Abstract
Data attribution has become a core primitive for pricing, auditing, and governing machine learning pipelines, yet current attribution methods implicitly assume that attribution value faithfully reflects participants' contributions. We show that this assumption can fail: a single participant in a standard distributed training workflow can substantially inflate its measured attribution value while keeping global utility intact. Our attribution-first attack uses a latent optimization procedure that injects small, utility-preserving synthetic batches to exploit non-IID label coverage and evaluator sensitivities. Across datasets, models, and multiple marginal-utility evaluators, the attack consistently raises the adversary’s attribution value and reshapes the relative attribution structure among benign clients without degrading accuracy or triggering geometry-based defenses. These results demonstrate that attribution itself constitutes a new attack surface and motivate the development of attribution-robust and incentive-compatible scoring mechanisms.