Timezone: »
The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. We propose a novel simplified gating mechanism that outperforms Oord et al. (2016) and investigate the impact of key architectural decisions. The proposed approach achieves state-of-the-art on the WikiText-103 benchmark, even though it features long-term dependencies, as well as competitive results on the Google Billion Words benchmark. Our model reduces the latency to score a sentence by an order of magnitude compared to a recurrent baseline. To our knowledge, this is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.
Author Information
Yann Dauphin (Facebook AI Research)
Angela Fan (Facebook AI Research)
Michael Auli (Facebook)
David Grangier (Facebook)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Talk: Language Modeling with Gated Convolutional Networks »
Wed. Aug 9th 05:30 -- 05:48 AM Room Parkside 1
More from the Same Authors
-
2023 Poster: Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language »
Alexei Baevski · Arun Babu · Wei-Ning Hsu · Michael Auli -
2023 Oral: Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language »
Alexei Baevski · Arun Babu · Wei-Ning Hsu · Michael Auli -
2022 Poster: data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language »
Alexei Baevski · Wei-Ning Hsu · Qiantong Xu · Arun Babu · Jiatao Gu · Michael Auli -
2022 Oral: data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language »
Alexei Baevski · Wei-Ning Hsu · Qiantong Xu · Arun Babu · Jiatao Gu · Michael Auli -
2019 Poster: Mixture Models for Diverse Machine Translation: Tricks of the Trade »
Tianxiao Shen · Myle Ott · Michael Auli · Marc'Aurelio Ranzato -
2019 Oral: Mixture Models for Diverse Machine Translation: Tricks of the Trade »
Tianxiao Shen · Myle Ott · Michael Auli · Marc'Aurelio Ranzato -
2018 Poster: Analyzing Uncertainty in Neural Machine Translation »
Myle Ott · Michael Auli · David Grangier · Marc'Aurelio Ranzato -
2018 Oral: Analyzing Uncertainty in Neural Machine Translation »
Myle Ott · Michael Auli · David Grangier · Marc'Aurelio Ranzato -
2017 Poster: Efficient softmax approximation for GPUs »
Edouard Grave · Armand Joulin · Moustapha Cisse · David Grangier · Herve Jegou -
2017 Poster: Convolutional Sequence to Sequence Learning »
Jonas Gehring · Michael Auli · David Grangier · Denis Yarats · Yann Dauphin -
2017 Poster: Parseval Networks: Improving Robustness to Adversarial Examples »
Moustapha Cisse · Piotr Bojanowski · Edouard Grave · Yann Dauphin · Nicolas Usunier -
2017 Talk: Convolutional Sequence to Sequence Learning »
Jonas Gehring · Michael Auli · David Grangier · Denis Yarats · Yann Dauphin -
2017 Talk: Efficient softmax approximation for GPUs »
Edouard Grave · Armand Joulin · Moustapha Cisse · David Grangier · Herve Jegou -
2017 Talk: Parseval Networks: Improving Robustness to Adversarial Examples »
Moustapha Cisse · Piotr Bojanowski · Edouard Grave · Yann Dauphin · Nicolas Usunier