The Devil is in the Spectrum: Mitigating Representation Collapse in LLMs via Topologically Regularized Side-Path
Yiheng Tao ⋅ Kaiwen Cheng ⋅ Yao Lu ⋅ Chang Liu ⋅ Jie Chen
Abstract
Large Language Models (LLMs) fundamentally suffer from representation collapse, a bottleneck that severely degrades performance in long contexts. We identify that existing approaches risk drifting into one of two pathological extremes: Homogenization Collapse (e.g., attention sinks causing rank deficiency) and Isolation Collapse (e.g., local attention causing context disconnection). Through spectral analysis of attention dynamics, we derive an intrinsic trade-off between Mixing Efficiency (spectral gap) and Information Capacity (effective rank), revealing that standard mechanisms struggle to maximize both simultaneously. To resolve this dilemma, we propose the Topologically Regularized Side-Path (TRSP), a non-invasive architectural intervention designed to achieve spectral balance. TRSP employs a parameter-free Triangular Box mechanism scaled by a lightweight, length-aware gate to explicitly regularize the token interaction topology. By integrating proximal coupling to preserve the effective rank and distal propagation to guarantee the spectral gap, this design ensures a geometrically healthy state without altering the core attention mechanism. Experiments yield significant performance improvements across general capabilities and long-context benchmarks. Notably, on the NoLiMa extrapolation benchmark at 8$\times$ the training length, TRSP surpasses strong baselines like the Differential Transformer and Gated Attention by approximately 30\% and 50\%, respectively.
Successful Page Load