Skip to yearly menu bar Skip to main content


Poster

A Stealthy, Accessible, and Provably Resilient Watermark for Language Models

Yihan Wu · Zhengmian Hu · Junfeng Guo · Hongyang Zhang · Heng Huang


Abstract:

Watermarking techniques offer a promising way to identify artifact content via embedding covert information into the contents generated from language models. A challenge in the domain lies in preserving the distribution of original generated content after watermarking. Our research extends and improves upon existing watermarking framework, placing emphasis on the importance of a Distribution-Preserving (DiP) watermark. Contrary to the current strategies, our proposed DiPmark provably preserves the original token distribution during watermarking (stealthy), is detectable without access to the language model API and prompts (accessible), and is provably robust to moderate changes of tokens (resilient). DiPmark operates by selecting a random set of tokens prior to the generation of a word, then modifying the token distribution through a distribution-preserving adjust function to enhance the probability of these selected tokens during the sampling process. Extensive empirical evaluation on various language models and tasks demonstrates our approach's stealthiness, accessibility, and resilience, making it a effective solution for watermarking tasks that demand impeccable quality preservation.

Live content is unavailable. Log in and register to view live content