Skip to yearly menu bar Skip to main content


Self-Attention for Computer Vision

Aravind Srinivas · Prajit Ramachandran · Ashish Vaswani



The tutorial will be about the application of self-attention mechanisms in computer vision. Self-Attention has been widely adopted in NLP, with the fully attentional Transformer model having largely replaced RNNs and now being used in state-of-the-art language understanding models like GPT, BERT, XLNet, T5, Electra, and Meena. Thus, there has been a tremendous interest in studying whether self-attention can have a similarly big and far-reaching impact in computer vision. However, vision tasks have different properties compared to language tasks, so a lot of research has been devoted to exploring the best way to apply self-attention to visual models. This tutorial will cover many of the different applications of self-attention in vision in order to give the viewer a broad and precise understanding of this subfield.

Chat is not available.