Skip to yearly menu bar Skip to main content


Poster

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

Antonio Orvieto · Soham De · Caglar Gulcehre · Razvan Pascanu · Samuel Smith


Abstract:

Deep neural networks based on linear complex-valued RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches to sequence modeling. Examples of such architectures include state-space models (SSMs) like S4 and Mamba, recently proposed architectures that achieve promising performance on text, genetics, and other data that require long-range reasoning. Despite experimental evidence highlighting the effectiveness and computational efficiency of these architectures, their expressive power remains relatively unexplored, especially in connection with specific design choices crucial in practice (e.g., initialization, complex eigenvalues). In this paper, we show that combining MLPs with both real or complex linear diagonal recurrences leads to arbitrarily precise approximation of regular sequence-to-sequence maps. At the heart of our proof, we rely on a separation of concerns: the linear RNN provides a lossless encoding of the input sequence, and the MLP performs non-linear processing on this encoding. While we show that real diagonal linear recurrences are theoretically sufficient to achieve universality in this architecture, we prove that using complex eigenvalues near unit disk -- i.e. empirically the most successful strategy in SSMs -- greatly helps the RNN in storing information. We connect this finding with the vanishing gradient issue and provide experimental evidence supporting our claims.

Live content is unavailable. Log in and register to view live content