Timezone: »
Attention is a key mechanism to enable nonparametric models in deep learning. Quite arguably it is the basis of most recent progress in deep learning models. Beyond its introduction in neural machine translation, it can be traced back to neuroscience. It was arguably introduced via the gating or forgetting mechanism of LSTMs. Over the past 5 years attention has been key to advancing the state of the art in areas as diverse as natural language processing, computer vision, speech recognition, image synthesis, solving traveling salesman problems, or reinforcement learning. This tutorial offers a coherent overview over various types of attention; efficient implementation using Jupyter notebooks which allow the audience a hands-on experience to replicate and apply attention mechanisms; and a textbook (www.d2l.ai) to allow the audience to dive more deeply into the underlying theory.
Author Information
Alex Smola (Amazon)
Aston Zhang (AWS AI)

Aston Zhang is a senior scientist at Amazon Web Services AI. His research interests are in deep learning. He received a Ph.D. in computer science from University of Illinois at Urbana-Champaign. He has served as an editorial board member for Frontiers in Big Data and a program committee member (reviewer) for ICML, NeurIPS, ICLR, WWW, KDD, SIGIR, and WSDM. His book Dive into Deep Learning (www.d2l.ai) has been used as a textbook in worldwide universities.
More from the Same Authors
-
2021 : Multimodal AutoML on Structured Tables with Text Fields »
Xingjian Shi · Jonas Mueller · Nick Erickson · Mu Li · Alex Smola -
2021 : Continuous Doubly Constrained Batch Reinforcement Learning »
Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Pratik Chaudhari · Alex Smola -
2022 : Adaptive Interest for Emphatic Reinforcement Learning »
Martin Klissarov · Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Taesup Kim · Alex Smola -
2023 Poster: RLSbench: Domain Adaptation Under Relaxed Label Shift »
Saurabh Garg · Nick Erickson · University of California James Sharpnack · Alex Smola · Sivaraman Balakrishnan · Zachary Lipton -
2022 : Discussion Panel »
Percy Liang · Léon Bottou · Jayashree Kalpathy-Cramer · Alex Smola -
2022 Poster: Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition »
Haotao Wang · Aston Zhang · Yi Zhu · Shuai Zheng · Mu Li · Alex Smola · Zhangyang “Atlas” Wang -
2022 Oral: Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition »
Haotao Wang · Aston Zhang · Yi Zhu · Shuai Zheng · Mu Li · Alex Smola · Zhangyang “Atlas” Wang -
2022 Poster: Removing Batch Normalization Boosts Adversarial Training »
Haotao Wang · Aston Zhang · Shuai Zheng · Xingjian Shi · Mu Li · Zhangyang “Atlas” Wang -
2022 Spotlight: Removing Batch Normalization Boosts Adversarial Training »
Haotao Wang · Aston Zhang · Shuai Zheng · Xingjian Shi · Mu Li · Zhangyang “Atlas” Wang -
2021 Poster: A Unified Lottery Ticket Hypothesis for Graph Neural Networks »
Tianlong Chen · Yongduo Sui · Xuxi Chen · Aston Zhang · Zhangyang “Atlas” Wang -
2021 Spotlight: A Unified Lottery Ticket Hypothesis for Graph Neural Networks »
Tianlong Chen · Yongduo Sui · Xuxi Chen · Aston Zhang · Zhangyang “Atlas” Wang -
2020 : Panel Discussion »
Neil Lawrence · Mihaela van der Schaar · Alex Smola · Valerio Perrone · Jack Parker-Holder · Zhengying Liu -
2020 : "AutoGluon and Distillation" by Alex Smola »
Alex Smola -
2020 Poster: ControlVAE: Controllable Variational Autoencoder »
Huajie Shao · Shuochao Yao · Dachun Sun · Aston Zhang · Shengzhong Liu · Dongxin Liu · Jun Wang · Tarek Abdelzaher -
2019 Poster: Deep Factors for Forecasting »
Yuyang Wang · Alex Smola · Danielle Robinson · Jan Gasthaus · Dean Foster · Tim Januschowski -
2019 Oral: Deep Factors for Forecasting »
Yuyang Wang · Alex Smola · Danielle Robinson · Jan Gasthaus · Dean Foster · Tim Januschowski -
2018 Poster: Learning Steady-States of Iterative Algorithms over Graphs »
Hanjun Dai · Zornitsa Kozareva · Bo Dai · Alex Smola · Le Song -
2018 Oral: Learning Steady-States of Iterative Algorithms over Graphs »
Hanjun Dai · Zornitsa Kozareva · Bo Dai · Alex Smola · Le Song -
2017 Poster: Canopy --- Fast Sampling with Cover Trees »
Manzil Zaheer · Satwik Kottur · Amr Ahmed · Jose Moura · Alex Smola -
2017 Talk: Canopy --- Fast Sampling with Cover Trees »
Manzil Zaheer · Satwik Kottur · Amr Ahmed · Jose Moura · Alex Smola -
2017 Poster: Latent LSTM Allocation: Joint clustering and non-linear dynamic modeling of sequence data »
Manzil Zaheer · Amr Ahmed · Alex Smola -
2017 Talk: Latent LSTM Allocation: Joint clustering and non-linear dynamic modeling of sequence data »
Manzil Zaheer · Amr Ahmed · Alex Smola -
2017 Tutorial: Distributed Deep Learning with MxNet Gluon »
Alex Smola · Aran Khanna