ICML Viewing Attention as a Recurrent Neural Network

Poster
in
Workshop: Next Generation of Sequence Modeling Architectures

Viewing Attention as a Recurrent Neural Network

Leo Feng · Frederick Tung · Hossein Hajimirsadeghi · Mohamed Osama Ahmed · Yoshua Bengio · Greg Mori

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

The advent of Transformers, a high-performant parallelizable model, marked a significant breakthrough in sequence modelling. However, Transformers are computationally expensive at inference time, limiting their applications, particularly in low-resource settings (e.g., embedded devices). Addressing this, we (1) show that attention can be viewed as a special Recurrent Neural Network (RNN) with the ability to compute its many-to-one RNN output efficiently. For sequence modelling, we (2) introduce a new efficient method of computing attention's many-to-many RNN output based on the parallel prefix scan algorithm. Building on the new attention formulation, we (3) introduce Aaren, an attention-based module that can not only (i) be trained in parallel (like Transformers) but also (ii) be updated efficiently with new tokens, requiring only constant memory (like traditional RNNs). Empirically, we show Aarens achieve comparable performance to Transformers on reinforcement learning and event forecasting while being more time and memory-efficient.

Chat is not available.

Poster in Workshop: Next Generation of Sequence Modeling Architectures

Viewing Attention as a Recurrent Neural Network

Leo Feng · Frederick Tung · Hossein Hajimirsadeghi · Mohamed Osama Ahmed · Yoshua Bengio · Greg Mori

Poster
in
Workshop: Next Generation of Sequence Modeling Architectures