ICML State Space Models are Comparable to Transformers in Estimating Functions with Dynamic Smoothness

Poster
in
Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)

State Space Models are Comparable to Transformers in Estimating Functions with Dynamic Smoothness

Naoki Nishikawa · Taiji Suzuki

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

While the capabilities of deep neural networks based on state space models (SSMs) have been primarily investigated through experimental comparisons, theoretical understanding is still limited. In particular, there is a lack of statistical and quantitative evaluation of whether SSMs can replace Transformers. In this paper, we theoretically explore in which tasks SSMs can be alternatives to Transformers from the perspective of estimating sequence-to-sequence functions. We consider the setting where the target function has direction-dependent smoothness, and prove that SSMs can estimate such functions with the same convergence rate as Transformers. Additionally, we prove that SSMs can estimate the target function as effectively as Transformers, even if the smoothness changes depending on the input sequence. Our results suggest that SSMs can replace Transformers when estimating the functions in certain classes that appear in practice.

Chat is not available.

Poster in Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)

State Space Models are Comparable to Transformers in Estimating Functions with Dynamic Smoothness

Naoki Nishikawa · Taiji Suzuki

Poster
in
Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)