Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Next Generation of Sequence Modeling Architectures

On the Bottleneck of State Space Models: Locality and Oversmoothing

Pragya Srivastava · Peihao Wang · Ruisi Cai · Jiajun Zhu · Pan Li · Zhangyang “Atlas” Wang


Abstract:

State Space Models (SSMs) have emerged as competitive alternatives to transformers in sequence modeling, particularly for processing long sequences. However, their ability to capture long-range dependencies faces limitations. We identify two main issues: local bias within a single SSM layer and over-smoothing when stacking layers. To address these, we propose Adaptive Jumping Knowledge (AJK), which combines representations from different layers. Integrating AJK into the Mamba architecture, we show improved performance in various training scenarios. AJK-enhanced models consistently outperform baseline models on public benchmarks.

Chat is not available.