STLA: Spatiotemporal Lookahead Alignment for Post-Training Quantization
Abstract
Adaptive rounding techniques in Post-Training Quantization (PTQ) enable the efficient deployment of Large Language Models (LLMs) with low resource and data dependencies. While learning-based rounding methods are accurate yet costly, compensation-based approaches offer a highly efficient alternative. However, synergizing these two to realize their full potential is hindered by spatiotemporal misalignment in the decoupled paradigm. Key challenges include temporal parameter conflict, the invalidation of the initial Round-to-Nearest (RTN) assumption, and spatially-inconsistent optimization objectives. This paper introduces STLA, a novel rounding-optimized PTQ framework that achieves both fast and accurate LLM quantization. STLA resolves temporal inconsistency through cluster-wise integrated rounding optimization, which collocates the learning and compensation phases. STLA achieves spatial alignment through a unified global objective derived from the Schur Complement, enabling the solver to look ahead and align local rounding decisions with the optimal future compensation of remaining weights. Furthermore, we propose a Hessian-guided clustering strategy that exploits both diagonal and off-diagonal information to maximize intra-cluster error cancellation. Extensive experiments demonstrate that STLA establishes a new state-of-the-art for low-bit PTQ while maintaining high computational efficiency. The code is available at https://anonymous.4open.science/r/STLA.