Collaborative Threshold Watermarking
Tameem Bakr ⋅ Anish Ambreth ⋅ Nils Lukas
Abstract
In federated learning (FL), $K$ clients jointly train a model without sharing raw data. Because each participant invests data and computing power, clients need mechanisms to later prove the provenance of a jointly trained model. Model watermarking embeds a hidden signal in the weights, but naive approaches either do not scale with many clients (per-client watermarks dilute as $K$ grows) or give any individual client the ability to verify (and potentially remove) a shared-key watermark. We introduce $(t,K)$-threshold watermarking: clients collaboratively embed a single watermark during training, while only coalitions of at least $t$ clients can reconstruct the watermark key and verify a suspect model, but any coalition of fewer than $t$ clients learns nothing about the watermark beyond the verification output. We instantiate our protocol in the white-box setting and evaluate on CIFAR-10, CIFAR-100, and Tiny ImageNet. Our watermark remains detectable at scale (up to $K=128$) with minimal accuracy loss and stays above the detection threshold ($z\ge 4$) under 90% pruning, 4-bit quantization, and adaptive fine-tuning using up to 20% of the training data.
Successful Page Load