Every Step Counts: Decoding Trajectories as Authorship Fingerprints of dLLMs
Abstract
Discrete Diffusion Large Language Models (dLLMs) have recently emerged as a promising non-autoregressive paradigm, offering faster inference while achieving strong performance in code generation and mathematical reasoning tasks. In this work, we show that dLLMs’ decoding mechanism not only improves utility but also enables effective model attribution: by analyzing a response’s decoding trajectory, we can identify its source model and help mitigate risks from model misuse. A key challenge is the diversity of attribution scenarios, ranging from distinguishing different models to identifying different checkpoints or backups of the same model. To ensure broad applicability, we focus on two core questions: what information to extract from the decoding trajectory, and how to use it effectively. We first observe that per-step model confidence is ineffective, as the bidirectional nature of dLLMs causes mutual influence among decoded tokens, leading to highly redundant confidence signals that obscure structural information about decoding order and dependencies. To overcome this, we propose a novel information extraction scheme called the \textit{Directed Decoding Map (DDM)}, which captures structural relationships between decoding steps and reveals model-specific behaviors. Furthermore, to fully leverage the extracted structure, we propose \textit{Gaussian-Trajectory Attribution (GTA)}, which fits a cell-wise Gaussian distribution at each decoding position for each model and uses log-likelihood differences between trajectories as the attribution score. Extensive experiments across diverse models, datasets and different model access assumptions validate the effectiveness of our approach.