Skip to yearly menu bar Skip to main content


A Tutorial on Attention in Deep Learning

Alex Smola · Aston Zhang

Hall A


Attention is a key mechanism to enable nonparametric models in deep learning. Quite arguably it is the basis of most recent progress in deep learning models. Beyond its introduction in neural machine translation, it can be traced back to neuroscience. It was arguably introduced via the gating or forgetting mechanism of LSTMs. Over the past 5 years attention has been key to advancing the state of the art in areas as diverse as natural language processing, computer vision, speech recognition, image synthesis, solving traveling salesman problems, or reinforcement learning. This tutorial offers a coherent overview over various types of attention; efficient implementation using Jupyter notebooks which allow the audience a hands-on experience to replicate and apply attention mechanisms; and a textbook ( to allow the audience to dive more deeply into the underlying theory.

Live content is unavailable. Log in and register to view live content