Revealing Long-context Potential of Attention Heads via Frequency Kernels
Abstract
Large language model (LLM) exists a subset of attention heads that are highly responsible for long-context processing. Existing work has identified different long-context heads in models, but their detection methods mainly rely on model inference on actual long texts and do not analyze the inherent properties of the head parameters. In this paper, we use kernel methods to analyze static frequency kernels formed by different rotation frequency components of attention heads, and we design a Long-context Potential Score (LPS) to measure the potential of attention heads in processing long contexts. Kernels of heads with high LPS exhibit concentrated low-frequency energy and low effective rank, which allow them to effectively capture highly specialized information from distant contexts. Experiments and analysis on long-context tasks and model behaviors show that the LPS metrics can well reflect the actual capability of heads on long contexts. Furthermore, by simply amplifying low-frequency kernels of heads with high retrieval potential, we can further improve model's performance on long-context tasks. Our metrics and head enhancement methods are fully static and offline, and they can be quickly conducted under low-resource constraints.