Fingerprinting Pre-trained Encoders under Arbitrary Downstream Fine-Tuning via Adversarial Shifting
Abstract
In the pre-training-fine-tuning paradigm, pre-trained encoders have become high-value intellectual property (IP) due to their immense training costs, necessitating robust protection. Existing fingerprinting or watermarking methods typically rely on pre-defined samples and labels, or require intrusive modifications to the training process. However, downstream fine-tuning can significantly alter an encoder's representation and label space, thereby destroying the label consistency of existing methods and rendering them ineffective. Consequently, it is both challenging and urgent to provide a downstream-agnostic, black-box ownership verification mechanism for pre-trained encoders. To address this, we propose a downstream-agnostic, label-only fingerprinting method that leverages Adversarial Shifting to construct stable fingerprint clusters in the encoder’s latent space. By exploiting the inherent output consistency of these clusters, our method remains effective regardless of the specific downstream task or label mapping. Extensive experiments demonstrate that our method maintains superior robustness and stealthiness across various downstream tasks and category scales, providing a practical and reliable IP protection scheme for high-value pre-trained encoders.