IACW: Intent-Aware Controllable Watermarking for Scalable Authorial Intent Attribution
Abstract
As Large Language Models (LLMs) integrate into writing workflows, precise governance requires distinguishing ''how AI participated'' rather than merely ''whether AI was used.'' Traditional binary detection often misclassifies ``AI-polished'' content as generated, creating fairness risks. We propose shifting from passive post-hoc detection to active intent attribution, focusing on the distinction between Editing (source-anchored) and Generation (unanchored). We introduce \textbf{IACW-Instruct}, a corpus of diverse editing operations constructed via a Director--Actor--Judge pipeline to enable systematic evaluation. Building on this benchmark, we propose \textbf{Intent-Aware Controllable Watermarking (IACW)}, featuring intent-adaptive entropy gating for semantically lossless embedding. Experiments show that IACW achieves 95\% attribution accuracy under 20\% token deletion while preserving near-unwatermarked semantic fidelity, establishing a practical paradigm for fine-grained provenance.