Timezone: »
Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and Transformer are the state-of-the-art on many tasks. However, they lack parallelism and are thus slow for long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallel during training, but still lack parallelism during decoding.We present a method to extend sequence models using discrete latent variables that makes decoding much more parallel. The main idea behind this approach is to first autoencode the target sequence into a shorter discrete latent sequence, which is generated autoregressively, and finally decode the full sequence from this shorter latent sequence in a parallel manner. To this end, we introduce a new method for constructing discrete latent variables and compare it with previously introduced methods. Finally, we verify that our model works on the task of neural machine translation, where our models are an order of magnitude faster than comparable autoregressive models and, while lower in BLEU than purely autoregressive models, better than previously proposed non-autogregressive translation.
Author Information
Lukasz Kaiser (Google)
Samy Bengio (Google Brain)
Aurko Roy (Google Brain)
Ashish Vaswani (Google Brain)
Niki Parmar (Google)
Jakob Uszkoreit
Noam Shazeer (Google)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Fast Decoding in Sequence Models Using Discrete Latent Variables »
Thu. Jul 12th 04:15 -- 07:00 PM Room Hall B #73
More from the Same Authors
-
2023 Poster: Generalization on the Unseen, Logic Reasoning and Degree Curriculum »
Emmanuel Abbe · Samy Bengio · Aryo Lotfi · Kevin Rizk -
2023 Oral: Generalization on the Unseen, Logic Reasoning and Degree Curriculum »
Emmanuel Abbe · Samy Bengio · Aryo Lotfi · Kevin Rizk -
2021 Tutorial: Self-Attention for Computer Vision »
Aravind Srinivas · Prajit Ramachandran · Ashish Vaswani -
2021 : Self-Attention for Computer Vision »
Ashish Vaswani -
2020 Affinity Workshop: New In ML »
Zhen Xu · Sparkle Russell-Puleri · Zhengying Liu · Sinead A Williamson · Matthias W Seeger · Wei-Wei Tu · Samy Bengio · Isabelle Guyon -
2019 Workshop: Identifying and Understanding Deep Learning Phenomena »
Hanie Sedghi · Samy Bengio · Kenji Hata · Aleksander Madry · Ari Morcos · Behnam Neyshabur · Maithra Raghu · Ali Rahimi · Ludwig Schmidt · Ying Xiao -
2019 Poster: Area Attention »
Yang Li · Lukasz Kaiser · Samy Bengio · Si Si -
2019 Oral: Area Attention »
Yang Li · Lukasz Kaiser · Samy Bengio · Si Si -
2018 Poster: Image Transformer »
Niki Parmar · Ashish Vaswani · Jakob Uszkoreit · Lukasz Kaiser · Noam Shazeer · Alexander Ku · Dustin Tran -
2018 Oral: Image Transformer »
Niki Parmar · Ashish Vaswani · Jakob Uszkoreit · Lukasz Kaiser · Noam Shazeer · Alexander Ku · Dustin Tran -
2018 Poster: Adafactor: Adaptive Learning Rates with Sublinear Memory Cost »
Noam Shazeer · Mitchell Stern -
2018 Oral: Adafactor: Adaptive Learning Rates with Sublinear Memory Cost »
Noam Shazeer · Mitchell Stern -
2017 Workshop: Reproducibility in Machine Learning Research »
Rosemary Nan Ke · Anirudh Goyal · Alex Lamb · Joelle Pineau · Samy Bengio · Yoshua Bengio -
2017 Poster: Device Placement Optimization with Reinforcement Learning »
Azalia Mirhoseini · Hieu Pham · Quoc Le · benoit steiner · Mohammad Norouzi · Rasmus Larsen · Yuefeng Zhou · Naveen Kumar · Samy Bengio · Jeff Dean -
2017 Talk: Device Placement Optimization with Reinforcement Learning »
Azalia Mirhoseini · Hieu Pham · Quoc Le · benoit steiner · Mohammad Norouzi · Rasmus Larsen · Yuefeng Zhou · Naveen Kumar · Samy Bengio · Jeff Dean -
2017 Poster: Sharp Minima Can Generalize For Deep Nets »
Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio -
2017 Talk: Sharp Minima Can Generalize For Deep Nets »
Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio