Timezone: »
The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training to better exploit the GPU hardware and optimization is easier since the number of non-linearities is fixed and independent of the input length. Our use of gated linear units eases gradient propagation and we equip each decoder layer with a separate attention module. We outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French translation at an order of magnitude faster speed, both on GPU and CPU.
Author Information
Jonas Gehring (Facebook AI Research)
Michael Auli (Facebook)
David Grangier (Facebook)
Denis Yarats (Facebook AI Research)
Yann Dauphin (Facebook AI Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Poster: Convolutional Sequence to Sequence Learning »
Wed. Aug 9th 08:30 AM -- 12:00 PM Room Gallery #114
More from the Same Authors
-
2023 Poster: Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language »
Alexei Baevski · Arun Babu · Wei-Ning Hsu · Michael Auli -
2023 Oral: Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language »
Alexei Baevski · Arun Babu · Wei-Ning Hsu · Michael Auli -
2022 Poster: data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language »
Alexei Baevski · Wei-Ning Hsu · Qiantong Xu · Arun Babu · Jiatao Gu · Michael Auli -
2022 Oral: data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language »
Alexei Baevski · Wei-Ning Hsu · Qiantong Xu · Arun Babu · Jiatao Gu · Michael Auli -
2019 Poster: Mixture Models for Diverse Machine Translation: Tricks of the Trade »
Tianxiao Shen · Myle Ott · Michael Auli · Marc'Aurelio Ranzato -
2019 Oral: Mixture Models for Diverse Machine Translation: Tricks of the Trade »
Tianxiao Shen · Myle Ott · Michael Auli · Marc'Aurelio Ranzato -
2018 Poster: Hierarchical Text Generation and Planning for Strategic Dialogue »
Denis Yarats · Mike Lewis -
2018 Poster: Analyzing Uncertainty in Neural Machine Translation »
Myle Ott · Michael Auli · David Grangier · Marc'Aurelio Ranzato -
2018 Oral: Hierarchical Text Generation and Planning for Strategic Dialogue »
Denis Yarats · Mike Lewis -
2018 Oral: Analyzing Uncertainty in Neural Machine Translation »
Myle Ott · Michael Auli · David Grangier · Marc'Aurelio Ranzato -
2017 Poster: Efficient softmax approximation for GPUs »
Edouard Grave · Armand Joulin · Moustapha Cisse · David Grangier · Herve Jegou -
2017 Poster: Parseval Networks: Improving Robustness to Adversarial Examples »
Moustapha Cisse · Piotr Bojanowski · Edouard Grave · Yann Dauphin · Nicolas Usunier -
2017 Poster: Language Modeling with Gated Convolutional Networks »
Yann Dauphin · Angela Fan · Michael Auli · David Grangier -
2017 Talk: Language Modeling with Gated Convolutional Networks »
Yann Dauphin · Angela Fan · Michael Auli · David Grangier -
2017 Talk: Efficient softmax approximation for GPUs »
Edouard Grave · Armand Joulin · Moustapha Cisse · David Grangier · Herve Jegou -
2017 Talk: Parseval Networks: Improving Robustness to Adversarial Examples »
Moustapha Cisse · Piotr Bojanowski · Edouard Grave · Yann Dauphin · Nicolas Usunier