Timezone: »
Language generation models have been an increasingly powerful enabler to many applications. Many such models offer free or affordable API access which makes them potentially vulnerable to model extraction attacks through distillation. To protect intellectual property (IP) and make fair use of these models, various techniques such as lexical watermarking and synonym replacement have been proposed. However, these methods can be nullified by obvious countermeasures such as ``synonym randomization''. To address this issue, we propose GINSW, a novel method to protect text generation models from being stolen through distillation. The key idea of our method is to inject secret signals into the probability vector of the decoding steps for each target token. We can then detect the secret message by probing a suspect model to tell if it is distilled from the protected one. Experimental results show that GINSW can effectively identify instances of IP infringement with minimal impact on the generation quality of protected APIs. Our method demonstrates an absolute improvement of 19 to 29 points on mean average precision (mAP) in detecting suspects compared to previous methods against watermark removal attacks.
Author Information
Xuandong Zhao (UCSB)
Yu-Xiang Wang (UC Santa Barbara / Amazon)
Lei Li (University of California Santa Barbara)
More from the Same Authors
-
2022 : Optimal Dynamic Regret in LQR Control »
Dheeraj Baby · Yu-Xiang Wang -
2023 : A Privacy-Friendly Approach to Data Valuation »
Jiachen Wang · Yuqing Zhu · Yu-Xiang Wang · Ruoxi Jia · Prateek Mittal -
2023 : Generating Global Factual and Counterfactual Explainer for Molecule under Domain Constraints »
Danqing Wang · Antonis Antoniades · Ambuj Singh · Lei Li -
2023 : Why Quantization Improves Generalization: NTK of Binary Weight Neural Network »
Kaiqi Zhang · Ming Yin · Yu-Xiang Wang -
2023 : Generative Autoencoders as Watermark Attackers: Analyses of Vulnerabilities and Threats »
Xuandong Zhao · Kexun Zhang · Yu-Xiang Wang · Lei Li -
2023 : Provable Robust Watermarking for AI-Generated Text »
Xuandong Zhao · Prabhanjan Ananth · Lei Li · Yu-Xiang Wang -
2023 Poster: Offline Reinforcement Learning with Closed-Form Policy Improvement Operators »
Jiachen Li · Edwin Zhang · Ming Yin · Jerry Bai · Yu-Xiang Wang · William Wang -
2023 Poster: Differentially Private Optimization on Large Model at Small Cost »
Zhiqi Bu · Yu-Xiang Wang · Sheng Zha · George Karypis -
2023 Poster: Importance Weighted Expectation-Maximization for Protein Sequence Design »
Zhenqiao Song · Lei Li -
2023 Poster: Non-stationary Reinforcement Learning under General Function Approximation »
Songtao Feng · Ming Yin · Ruiquan Huang · Yu-Xiang Wang · Jing Yang · Yingbin LIANG -
2023 Poster: Global Optimization with Parametric Function Approximation »
Chong Liu · Yu-Xiang Wang -
2023 Poster: ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval »
Kexun Zhang · Xianjun Yang · William Wang · Lei Li -
2022 Poster: Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost »
Dan Qiao · Ming Yin · Ming Min · Yu-Xiang Wang -
2022 Spotlight: Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost »
Dan Qiao · Ming Yin · Ming Min · Yu-Xiang Wang -
2022 Poster: On the Learning of Non-Autoregressive Transformers »
Fei Huang · Tianhua Tao · Hao Zhou · Lei Li · Minlie Huang -
2022 Spotlight: On the Learning of Non-Autoregressive Transformers »
Fei Huang · Tianhua Tao · Hao Zhou · Lei Li · Minlie Huang