Timezone: »
Transformers based on the attention mechanism have achieved impressive success in various areas. However, the attention mechanism has a quadratic complexity, significantly impeding Transformers from dealing with numerous tokens and scaling up to bigger models. Previous methods mainly utilize the similarity decomposition and the associativity of matrix multiplication to devise linear-time attention mechanisms. They avoid degeneration of attention to a trivial distribution by reintroducing inductive biases such as the locality, thereby at the expense of model generality and expressiveness. In this paper, we linearize Transformers free from specific inductive biases based on the flow network theory. We cast attention as the information flow aggregated from the sources (values) to the sinks (results) through the learned flow capacities (attentions). Within this framework, we apply the property of flow conservation into attention and propose the Flow-Attention mechanism of linear complexity. By respectively conserving the incoming flow of sinks for source competition and the outgoing flow of sources for sink allocation, Flow-Attention inherently generates informative attentions without using specific inductive biases. Empowered by the Flow-Attention, Flowformer yields strong performance in linear time for wide areas, including long sequence, time series, vision, natural language, and reinforcement learning. The code and settings are available at this repository: https://github.com/thuml/Flowformer.
Author Information
Haixu Wu (Tsinghua University)
Jialong Wu (Tsinghua University)
Jiehui Xu (THU)
Jianmin Wang (Tsinghua University)
Mingsheng Long (Tsinghua University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Flowformer: Linearizing Transformers with Conservation Flows »
Thu. Jul 21st through Fri the 22nd Room Hall E #423
More from the Same Authors
-
2023 Poster: Estimating Heterogeneous Treatment Effects: Mutual Information Bounds and Learning Algorithms »
Xingzhuo Guo · Yuchen Zhang · Jianmin Wang · Mingsheng Long -
2023 Poster: CLIPood: Generalizing CLIP to Out-of-Distributions »
Yang Shu · Xingzhuo Guo · Jialong Wu · Ximei Wang · Jianmin Wang · Mingsheng Long -
2023 Poster: Solving High-Dimensional PDEs with Latent Spectral Models »
Haixu Wu · Tengge Hu · huakun luo · Jianmin Wang · Mingsheng Long -
2021 Poster: LogME: Practical Assessment of Pre-trained Models for Transfer Learning »
Kaichao You · Yong Liu · Jianmin Wang · Mingsheng Long -
2021 Spotlight: LogME: Practical Assessment of Pre-trained Models for Transfer Learning »
Kaichao You · Yong Liu · Jianmin Wang · Mingsheng Long -
2021 Poster: Representation Subspace Distance for Domain Adaptation Regression »
Xinyang Chen · Sinan Wang · Jianmin Wang · Mingsheng Long -
2021 Spotlight: Representation Subspace Distance for Domain Adaptation Regression »
Xinyang Chen · Sinan Wang · Jianmin Wang · Mingsheng Long -
2021 Poster: Self-Tuning for Data-Efficient Deep Learning »
Ximei Wang · Jinghan Gao · Mingsheng Long · Jianmin Wang -
2021 Poster: Zoo-Tuning: Adaptive Transfer from A Zoo of Models »
Yang Shu · Zhi Kou · Zhangjie Cao · Jianmin Wang · Mingsheng Long -
2021 Spotlight: Self-Tuning for Data-Efficient Deep Learning »
Ximei Wang · Jinghan Gao · Mingsheng Long · Jianmin Wang -
2021 Spotlight: Zoo-Tuning: Adaptive Transfer from A Zoo of Models »
Yang Shu · Zhi Kou · Zhangjie Cao · Jianmin Wang · Mingsheng Long -
2020 Poster: Unsupervised Transfer Learning for Spatiotemporal Predictive Networks »
Zhiyu Yao · Yunbo Wang · Mingsheng Long · Jianmin Wang -
2019 Poster: Bridging Theory and Algorithm for Domain Adaptation »
Yuchen Zhang · Tianle Liu · Mingsheng Long · Michael Jordan -
2019 Oral: Bridging Theory and Algorithm for Domain Adaptation »
Yuchen Zhang · Tianle Liu · Mingsheng Long · Michael Jordan -
2019 Poster: Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers »
Hong Liu · Mingsheng Long · Jianmin Wang · Michael Jordan -
2019 Poster: Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation »
Kaichao You · Ximei Wang · Mingsheng Long · Michael Jordan -
2019 Poster: Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation »
Xinyang Chen · Sinan Wang · Mingsheng Long · Jianmin Wang -
2019 Oral: Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation »
Kaichao You · Ximei Wang · Mingsheng Long · Michael Jordan -
2019 Oral: Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation »
Xinyang Chen · Sinan Wang · Mingsheng Long · Jianmin Wang -
2019 Oral: Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers »
Hong Liu · Mingsheng Long · Jianmin Wang · Michael Jordan -
2018 Poster: PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning »
Yunbo Wang · Zhifeng Gao · Mingsheng Long · Jianmin Wang · Philip Yu -
2018 Oral: PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning »
Yunbo Wang · Zhifeng Gao · Mingsheng Long · Jianmin Wang · Philip Yu -
2017 Poster: Deep Transfer Learning with Joint Adaptation Networks »
Mingsheng Long · Han Zhu · Jianmin Wang · Michael Jordan -
2017 Talk: Deep Transfer Learning with Joint Adaptation Networks »
Mingsheng Long · Han Zhu · Jianmin Wang · Michael Jordan