Poster
in
Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)
State Space Models are Comparable to Transformers in Estimating Functions with Dynamic Smoothness
Naoki Nishikawa · Taiji Suzuki
While the capabilities of deep neural networks based on state space models (SSMs) have been primarily investigated through experimental comparisons, theoretical understanding is still limited. In particular, there is a lack of statistical and quantitative evaluation of whether SSMs can replace Transformers. In this paper, we theoretically explore in which tasks SSMs can be alternatives to Transformers from the perspective of estimating sequence-to-sequence functions. We consider the setting where the target function has direction-dependent smoothness, and prove that SSMs can estimate such functions with the same convergence rate as Transformers. Additionally, we prove that SSMs can estimate the target function as effectively as Transformers, even if the smoothness changes depending on the input sequence. Our results suggest that SSMs can replace Transformers when estimating the functions in certain classes that appear in practice.