CGSVD: Cascaded Granular Singular Value Decomposition for Large Language Model Compression
Abstract
The exponential growth in the parameter scale of Large Language Models (LLMs) has precipitated an urgent demand for efficient compression techniques to facilitate practical deployment. To address this challenge, low-rank decomposition based on Singular Value Decomposition (SVD) offers a principled, hardware-friendly pathway for compressing LLMs without retraining. However, existing training-free approaches predominantly rely on uniform rank allocation, implicitly assuming homogeneous redundancy across the model depth and thereby neglecting the inherent non-uniformity of representational evolution. To bridge this gap, we introduce \textbf{CGSVD}, a \uline{\textbf{C}}ascaded \uline{\textbf{G}}ranular \uline{\textbf{S}}ingular \uline{\textbf{V}}alue \uline{\textbf{D}}ecomposition framework that leverages a dual-level non-uniform allocation strategy to maximize semantic preservation. Specifically, we quantify inter-layer significance via angular distance and assess intra-layer compressibility through spectral entropy, enabling precise identification of critical architectural components. Furthermore, we propose an Iterative Residual Filling (IRF) mechanism to bridge the parameter gap caused by integer-rank truncation and ensure strict adherence to global compression targets. Extensive experiments on representative LLM families ranging from 3B to 13B parameters verify the superiority of our approach. Notably, under a 30\% compression ratio on the LLaMA3.1-8B model, CGSVD achieves a remarkable average zero-shot accuracy boost of 6.08\% and reduces perplexity by 33.39 compared to the baseline. We release the code\footnote{The code is available at: \url{https://anonymous.4open.science/r/CGSVD-BD6E}.} to facilitate future research.