Where’s the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions
Nicole H. Ma ⋅ Nick Rui
Abstract
We study $\textit{planning site formation}$ in language models---$\textit{where}$ internal representations of structurally-constrained future tokens form during the forward pass, and whether they causally drive generation. Using rhyming-couplet completion as a clean test of forward-looking constraint, we apply two lightweight methods (linear probing and activation patching) across Qwen3, Gemma-3, and Llama-3 at more than ten scales. Probing shows that future-rhyme information is linearly decodable at the line boundary, with signal that strengthens with scale in all three families. Activation patching reveals that only Gemma-3-27B causally relies on this encoding, exhibiting a $\textit{handoff}$ in which the causal driver migrates from the rhyme word to the line boundary around layer 30. Every other model we test conditions on the rhyme word throughout generation, with near-zero causal effect at the line boundary despite strong probe signal. We localize the Gemma-3-27B handoff to five attention heads through two-stage path patching that recover ~${90}$% of the rhyme-routing capacity at the newline.
Successful Page Load