How Good is Post-Hoc Watermarking With Language Model Rephrasing?
Abstract
Generation-time text watermarking embeds statistical signals into text for traceability of AI-generated content. We explore post-hoc watermarking where an LLM rewrites existing text while applying generation-time watermarking, to protect copyrighted documents, or detect their use in training or RAG via watermark radioactivity. Unlike generation-time approaches which are constrained by how LLMs are served, this setting offers additional degrees of freedom for both generation and detection. We thus investigate how allocating compute (through larger rephrasing models, beam search, multi-candidate generation, or entropy filtering at detection) affects the quality-detectability trade-off. Among our findings, the simple Gumbel-max scheme surprisingly outperforms more recent alternatives under nucleus sampling, and achieves strong detectability and semantic fidelity on open-ended text such as books. Moreover, most methods benefit significantly from beam search, and we counterintuitively find that smaller models outperform larger ones. However, our solutions struggles when watermarking verifiable text such as code. This study reveals both the potential and limitations of post-hoc watermarking, laying groundwork for practical applications and future research.