Detecting the Semantic Fixed Point: A Geometric Framework for Efficient Inference
Jiawei Gu ⋅ Ziyue Qiao ⋅ Xiao Luo
Abstract
Each layer of a Transformer refines the hidden state toward a prediction, an iterative process resembling fixed-point iteration. Yet when should this iteration terminate? Existing early exit methods rely on output confidence as a proxy for internal convergence. We take a more direct approach by examining the geometry of the hidden state trajectory. We find that layer-wise updates exhibit a two-phase structure: large, volatile updates in early layers, followed by small, aligned updates as the model propagates an already-formed representation. The transition is remarkably sharp. This yields a simple criterion: exit when step size vanishes and direction stabilizes. We track the normalized update norm and cosine similarity between consecutive updates, exiting when both indicate convergence. The overhead is $O(d)$ per layer, independent of vocabulary size, requiring no learned components or architectural modifications. On LLaMA-2-7B and LLaMA-2-13B across question answering and commonsense reasoning tasks, this geometric criterion reduces FLOPs by 30--35\% while retaining over 98\% of full-depth accuracy.
Successful Page Load