DecodeShare: Tracing the Shared Pathways of LLM Decode-Time Decisions
Zishan Shao ⋅ Lixun Zhang ⋅ Kangning Cui ⋅ Yixiao Wang ⋅ Ting Jiang ⋅ Hancheng Ye ⋅ Qinsi Wang ⋅ Zhixu Du ⋅ Yuzhe Fu ⋅ Fan Yang ⋅ Danyang Zhuo ⋅ Yiran Chen ⋅ Hai Li
Abstract
Large language models (LLMs) handle many tasks with one set of parameters, but under KV-cached inference it is unclear what task-general structure, if any, is used at $\textit{decode time}$ rather than during $\textit{prefill}$. We propose $\textbf{DecodeShare}$, a protocol that identifies a low-dimensional subspace that is consistently shared across tasks in decode-time hidden states, and then tests its causal role by removing that subspace only during decoding. In our experiments, disturbing the discovered shared subspace degrades decision performance far more than disturbing either a prefill-derived subspace or a random subspace under the same intervention budget. We further find that this decode-shared subspace overlaps common steering vectors, enabling a simple offline adjustment: projecting steering vectors away from the shared subspace can reduce template sensitivity while preserving non-random task utility, with task-dependent trade-offs. Despite being compact, the shared subspace can serve as a high-leverage causal channel at decode time.
Successful Page Load