MuonSSM: Orthogonalizing State Space Models for Sequence Modeling
Abstract
State-space models (SSMs) have emerged as efficient linear-time alternatives to attention for long-sequence modeling. However, existing SSMs often suffer from instability and memory degradation over extended horizons due to poorly conditioned first-order updates and uncontrolled spectral geometry. We introduce MuonSSM, a general framework that stabilizes SSM training by explicitly conditioning the geometry of memory updates rather than the recurrent transition matrix. MuonSSM augments standard SSMs with a momentum-based pathway and lightweight Newton–Schulz iterations on low-rank input injections, yielding approximately norm-preserving and spectrally balanced updates while preserving parallel scan complexity. Theoretical analysis demonstrates substantial improvements in gradient propagation and mitigation of vanishing gradients over long horizons. Extensive experiments across language, vision, and time-series benchmarks show consistent gains in accuracy, robustness, and long-context performance when integrated into diverse SSM backbones. These results establish geometric conditioning of updates as a principled pathway to stable, scalable sequence modeling.