The Truth Lies Somewhere in the Middle (of the Generated Tokens)
Sophie Wang ⋅ Phillip Isola ⋅ Brian Cheung
Abstract
How should the sequence of hidden states produced during autoregressive generation be compressed into a representation that reflects the model’s internal state? We study representations derived from generated tokens and compare them to grounded embeddings across several domains. We find that pooling embeddings across tokens produces more informative representations than any individual token. This observation is consistent with semantic information being distributed across generated tokens rather than localized to a single position. In this setting, alignment provides a way to study how a model’s internal representations evolve and pooling offers a more reliable summary of the model's state across generation.
Successful Page Load