Towards Long-Horizon Interpretability: Efficient and Faithful Multi-Token Attribution for Reasoning LLMs
Wenbo Pan ⋅ Zhichao Liu ⋅ Xianlong Wang ⋅ Yu Haining ⋅ Xiaohua Jia
Abstract
Token attribution methods provide intuitive explanations for language model outputs by identifying causally important input tokens. However, as modern LLMs increasingly rely on extended reasoning chains, existing schemes face two critical challenges: (1) efficiency bottleneck, where attributing a sequence of $|\mathbf{S}|$ tokens requires $\mathcal{O}(|\mathbf{S}|^2)$ operations, making long-context attribution prohibitively slow; and (2) faithfulness drop, where intermediate reasoning tokens absorb attribution mass, preventing importance from propagating back to the original input. To address these, we introduce **FlashTrace**, an efficient multi-token attribution method that employs span-wise aggregation to compute attribution over *multi-token targets in a single pass*, reducing complexity to $\mathcal{O}(|\mathbf{S}|)$. Moreover, we design a recursive attribution mechanism that traces importance through intermediate reasoning chains back to source inputs. Extensive experiments on long-context retrieval (RULER) and multi-step reasoning (MATH, MorehopQA) tasks demonstrate that FlashTrace achieves over 130× speedup over existing baselines while maintaining superior faithfulness. We further analyze the dynamics of recursive attribution, showing that even a single recursive hop substantially improves faithfulness by tracing importance through the reasoning chain.
Successful Page Load