Timezone: »
Poster
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos · Apoorv Vyas · Nikolaos Pappas · François Fleuret
Wed Jul 15 01:00 PM -- 01:45 PM & Thu Jul 16 01:00 AM -- 01:45 AM (PDT) @
Transformers achieve remarkable performance in several tasks but due to their
quadratic complexity, with respect to the input's length, they are
prohibitively slow for very long sequences. To address this limitation, we
express the self-attention as a linear dot-product of kernel feature maps and
make use of the associativity property of matrix products to reduce the
complexity from $\bigO{N^2}$ to $\bigO{N}$, where $N$ is the sequence length.
We show that this formulation permits an iterative implementation that
dramatically accelerates autoregressive transformers and reveals their
relationship to recurrent neural networks. Our \textit{Linear Transformers}
achieve similar performance to vanilla Transformers and they are up to 4000x
faster on autoregressive prediction of very long sequences.
Author Information
Angelos Katharopoulos (Idiap & EPFL)
Apoorv Vyas (Idiap Research Institute and EPFL)
Nikolaos Pappas (University of Washington)
François Fleuret (University of Geneva)
More from the Same Authors
-
2023 : Towards Efficient World Models »
Eloi Alonso · Vincent Micheli · François Fleuret -
2023 : 🎤 Fast Causal Attention with Dynamic Sparsity »
Daniele Paliotta · Matteo Pagliardini · Martin Jaggi · François Fleuret -
2023 : DeepEMD: A Transformer-based Fast Estimation of the Earth Mover's Distance »
Atul Kumar Sinha · François Fleuret -
2023 Poster: Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models »
Nikolaos Dimitriadis · Pascal Frossard · François Fleuret -
2020 Poster: Optimizer Benchmarking Needs to Account for Hyperparameter Tuning »
Prabhu Teja Sivaprasad · Florian Mai · Thijs Vogels · Martin Jaggi · François Fleuret -
2019 Poster: Processing Megapixel Images with Deep Attention-Sampling Models »
Angelos Katharopoulos · Francois Fleuret -
2019 Oral: Processing Megapixel Images with Deep Attention-Sampling Models »
Angelos Katharopoulos · Francois Fleuret -
2018 Poster: Not All Samples Are Created Equal: Deep Learning with Importance Sampling »
Angelos Katharopoulos · Francois Fleuret -
2018 Oral: Not All Samples Are Created Equal: Deep Learning with Importance Sampling »
Angelos Katharopoulos · Francois Fleuret