Poster
in
Workshop: Next Generation of Sequence Modeling Architectures
On Feature Learning in Structured State Space Models
Leena Chennuru Vankadara · Jin Xu · Moritz Haas · Volkan Cevher
This paper studies the ability of SSMs to learn features as network width approaches infinity. Our findings reveal that established scaling rules, such as the maximal update parameterization or spectral scaling conditions, fail to support feature learning due to the non-representability of these models in the form of tensor programs. Through a detailed signal propagation analysis in SSMs---both forward and backward, we identify the appropriate scaling necessary for non-trivial feature evolution in the infinite-width regime. Our proposed scaling shows behavior akin to maximal update parameterization, such as improved stability, better generalization, and transferability of optimal hyperparameters from small to large scale SSMs.