CircuitPrint: Mechanistic Circuit Fingerprints for Large Language Models
Abstract
Large language models (LLMs) are trained at significant computational and data cost, making them valuable intellectual property (IP). Existing IP verification methods primarily rely either on invasive watermarking that degrades model utility, or on superficial behavioral signatures disrupted by fine-tuning and model merging. This apparent trade-off between model utility and IP protection has constrained practical deployment. We challenge this trade-off and propose CircuitPrint, a non-invasive IP fingerprinting framework that enables robust verification through standard model queries by leveraging stable internal computational circuits of LLMs. We show that these circuits function as a persistent computational backbone across model derivatives, allowing them to serve as stable fingerprints for distinguishing LLMs. Building on this stability, CircuitPrint constructs IP signatures by identifying mechanistically essential supernodes that causally produce specific predictions within these circuits. Specifically, trigger queries are synthesized to replicate the internal suppression of these supernodes, thereby inducing distinctive and observable output shifts. Experimental results demonstrate that CircuitPrint substantially outperforms existing baselines while remaining robust under aggressive fine-tuning and model merging, effectively resolving this trade-off without altering model parameters.