How Transformers Represent Hierarchies: A Local-to-Global Mechanism
Abstract
Large language models built on autoregressive Transformers excel at next-token prediction, but it is unclear how their internal computations capture the latent hierarchical dependencies that often underlie language. We study this question in a controlled formal-language setting based on probabilistic context-free grammars (PCFGs), where sequences are generated by a latent hierarchical process. Empirically, standard autoregressive Transformers can be trained to accurately match the grammar-induced next-token distribution. Using probing analyses, we find that Transformer hidden states contain information used by classical parsing algorithms. Moreover, this information emerges through a layer-wise progression, revealing a local-to-global mechanism: early layers accumulate local patterns, while later layers aggregate them into a compact summary for next-token prediction. Complementing these empirical findings, we provide an explicit construction of Transformers that can parse binary PCFGs with depth \emph{logarithmic} in the grammar's sequence length. Surprisingly, trained Transformers in this setting exhibit prediction behavior and internal representations that closely mirror our construction. Together, our results offer a mechanistic account of how Transformers integrate hierarchical parsing with autoregressive generation.