Timezone: »
Existing attention mechanisms are trained to attend to individual items in a collection (the memory) with a predefined, fixed granularity, e.g., a word token or an image grid. We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e.g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences. Importantly, the shape and the size of an area are dynamically determined via learning, which enables a model to attend to information with varying granularity. Area attention can easily work with existing model architectures such as multi-head attention for simultaneously attending to multiple areas in the memory. We evaluate area attention on two tasks: neural machine translation (both character and token-level) and image captioning, and improve upon strong (state-of-the-art) baselines in all the cases. These improvements are obtainable with a basic form of area attention that is parameter free.
Author Information
Yang Li (Google Research)
Lukasz Kaiser (Google)
Samy Bengio (Google Research Brain Team)
Si Si (Google Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Area Attention »
Wed. Jun 12th 01:30 -- 04:00 AM Room Pacific Ballroom #27
More from the Same Authors
-
2023 : Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus »
Gang Li · Yang Li -
2023 : Towards Semantically-Aware UI Design Tools: Design, Implementation and Evaluation of Semantic Grouping Guidelines »
Peitong Duan · Bjorn Hartmann · Karina Nguyen · Yang Li · Marti Hearst · Meredith Morris -
2023 Workshop: Artificial Intelligence & Human Computer Interaction »
Yang Li · Ranjay Krishna · Helena Vasconcelos · Bryan Wang · Forrest Huang -
2023 : Panel on Reasoning Capabilities of LLMs »
Guy Van den Broeck · Ishita Dasgupta · Subbarao Kambhampati · Jiajun Wu · Xi Victoria Lin · Samy Bengio · Beliz Gunel -
2023 : Generalization on the Unseen, Logic Reasoning and Degree Curriculum »
Samy Bengio -
2023 Panel: The Societal Impacts of AI »
Sanmi Koyejo · Samy Bengio · Ashia Wilson · Kirikowhai Mikaere · Joelle Pineau -
2023 Poster: PLay: Parametrically Conditioned Layout Generation using Latent Diffusion »
Chin-Yi Cheng · Forrest Huang · Gang Li · Yang Li -
2023 Poster: Generalization on the Unseen, Logic Reasoning and Degree Curriculum »
Emmanuel Abbe · Samy Bengio · Aryo Lotfi · Kevin Rizk -
2023 Oral: Generalization on the Unseen, Logic Reasoning and Degree Curriculum »
Emmanuel Abbe · Samy Bengio · Aryo Lotfi · Kevin Rizk -
2023 Poster: Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory »
Justin Cui · Ruochen Wang · Si Si · Cho-Jui Hsieh -
2020 Affinity Workshop: New In ML »
Zhen Xu · Sparkle Russell-Puleri · Zhengying Liu · Sinead A Williamson · Matthias W Seeger · Wei-Wei Tu · Samy Bengio · Isabelle Guyon -
2019 Workshop: Identifying and Understanding Deep Learning Phenomena »
Hanie Sedghi · Samy Bengio · Kenji Hata · Aleksander Madry · Ari Morcos · Behnam Neyshabur · Maithra Raghu · Ali Rahimi · Ludwig Schmidt · Ying Xiao -
2018 Poster: Image Transformer »
Niki Parmar · Ashish Vaswani · Jakob Uszkoreit · Lukasz Kaiser · Noam Shazeer · Alexander Ku · Dustin Tran -
2018 Poster: Fast Decoding in Sequence Models Using Discrete Latent Variables »
Lukasz Kaiser · Samy Bengio · Aurko Roy · Ashish Vaswani · Niki Parmar · Jakob Uszkoreit · Noam Shazeer -
2018 Oral: Image Transformer »
Niki Parmar · Ashish Vaswani · Jakob Uszkoreit · Lukasz Kaiser · Noam Shazeer · Alexander Ku · Dustin Tran -
2018 Oral: Fast Decoding in Sequence Models Using Discrete Latent Variables »
Lukasz Kaiser · Samy Bengio · Aurko Roy · Ashish Vaswani · Niki Parmar · Jakob Uszkoreit · Noam Shazeer -
2017 Workshop: Reproducibility in Machine Learning Research »
Rosemary Nan Ke · Anirudh Goyal · Alex Lamb · Joelle Pineau · Samy Bengio · Yoshua Bengio -
2017 Poster: Device Placement Optimization with Reinforcement Learning »
Azalia Mirhoseini · Hieu Pham · Quoc Le · benoit steiner · Mohammad Norouzi · Rasmus Larsen · Yuefeng Zhou · Naveen Kumar · Samy Bengio · Jeff Dean -
2017 Talk: Device Placement Optimization with Reinforcement Learning »
Azalia Mirhoseini · Hieu Pham · Quoc Le · benoit steiner · Mohammad Norouzi · Rasmus Larsen · Yuefeng Zhou · Naveen Kumar · Samy Bengio · Jeff Dean -
2017 Poster: Sharp Minima Can Generalize For Deep Nets »
Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio -
2017 Poster: Gradient Boosted Decision Trees for High Dimensional Sparse Output »
Si Si · Huan Zhang · Sathiya Keerthi · Dhruv Mahajan · Inderjit Dhillon · Cho-Jui Hsieh -
2017 Talk: Gradient Boosted Decision Trees for High Dimensional Sparse Output »
Si Si · Huan Zhang · Sathiya Keerthi · Dhruv Mahajan · Inderjit Dhillon · Cho-Jui Hsieh -
2017 Talk: Sharp Minima Can Generalize For Deep Nets »
Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio