Timezone: »
We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computation time. Our approach further reduces the computational cost by exploiting the specificities of modern architectures and matrix-matrix vector operations, making it particularly suited for graphical processing units. Our experiments carried out on standard benchmarks, such as EuroParl and One Billion Word, show that our approach brings a large gain in efficiency over standard approximations while achieving an accuracy close to that of the full softmax. The code of our method is available at https://github.com/facebookresearch/adaptive-softmax.
Author Information
Edouard Grave (Facebook AI Research)
Armand Joulin (Facebook)
Moustapha Cisse
David Grangier (Facebook)
Herve Jegou (Facebook AI Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Talk: Efficient softmax approximation for GPUs »
Wed. Aug 9th 01:42 -- 02:00 AM Room Darling Harbour Theatre
More from the Same Authors
-
2021 Poster: Training data-efficient image transformers & distillation through attention »
Hugo Touvron · Matthieu Cord · Douze Matthijs · Francisco Massa · Alexandre Sablayrolles · Herve Jegou -
2021 Spotlight: Training data-efficient image transformers & distillation through attention »
Hugo Touvron · Matthieu Cord · Douze Matthijs · Francisco Massa · Alexandre Sablayrolles · Herve Jegou -
2020 Poster: Radioactive data: tracing through training »
Alexandre Sablayrolles · Douze Matthijs · Cordelia Schmid · Herve Jegou -
2019 Poster: White-box vs Black-box: Bayes Optimal Strategies for Membership Inference »
Alexandre Sablayrolles · Douze Matthijs · Cordelia Schmid · Yann Ollivier · Herve Jegou -
2019 Oral: White-box vs Black-box: Bayes Optimal Strategies for Membership Inference »
Alexandre Sablayrolles · Douze Matthijs · Cordelia Schmid · Yann Ollivier · Herve Jegou -
2018 Poster: Analyzing Uncertainty in Neural Machine Translation »
Myle Ott · Michael Auli · David Grangier · Marc'Aurelio Ranzato -
2018 Oral: Analyzing Uncertainty in Neural Machine Translation »
Myle Ott · Michael Auli · David Grangier · Marc'Aurelio Ranzato -
2018 Poster: Optimizing the Latent Space of Generative Networks »
Piotr Bojanowski · Armand Joulin · David Lopez-Paz · Arthur Szlam -
2018 Oral: Optimizing the Latent Space of Generative Networks »
Piotr Bojanowski · Armand Joulin · David Lopez-Paz · Arthur Szlam -
2017 Poster: Convolutional Sequence to Sequence Learning »
Jonas Gehring · Michael Auli · David Grangier · Denis Yarats · Yann Dauphin -
2017 Poster: Language Modeling with Gated Convolutional Networks »
Yann Dauphin · Angela Fan · Michael Auli · David Grangier -
2017 Talk: Convolutional Sequence to Sequence Learning »
Jonas Gehring · Michael Auli · David Grangier · Denis Yarats · Yann Dauphin -
2017 Talk: Language Modeling with Gated Convolutional Networks »
Yann Dauphin · Angela Fan · Michael Auli · David Grangier -
2017 Poster: Unsupervised Learning by Predicting Noise »
Piotr Bojanowski · Armand Joulin -
2017 Talk: Unsupervised Learning by Predicting Noise »
Piotr Bojanowski · Armand Joulin