Timezone: »
The Softmax function on top of a final linear layer is the de facto method to output probability distributions in neural networks. In many applications such as language models or text generation, this model has to produce distributions over large output vocabularies. Recently, this has been shown to have limited representational capacity due to its connection with the rank bottleneck in matrix factorization. However, little is known about the limitations of Linear-Softmax for quantities of practical interest such as cross entropy or mode estimation, a direction that we explore here. As an efficient and effective solution to alleviate this issue, we propose to learn parametric monotonic functions on top of the logits. We theoretically investigate the rank increasing capabilities of such monotonic functions. Empirically, our method improves in two different quality metrics over the traditional Linear-Softmax layer in synthetic and real language model experiments, adding little time or memory overhead, while being comparable to the more computationally expensive mixture of Softmaxes.
Author Information
Octavian-Eugen Ganea (ETH Zurich)
Sylvain Gelly (Google Brain)
Gary Becigneul (ETHZ)
Aliaksei Severyn (Google)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities »
Thu. Jun 13th 11:25 -- 11:30 PM Room Hall A
More from the Same Authors
-
2019 Poster: Parameter-Efficient Transfer Learning for NLP »
Neil Houlsby · Andrei Giurgiu · Stanislaw Jastrzebski · Bruna Morrone · Quentin de Laroussilhe · Andrea Gesmundo · Mona Attariyan · Sylvain Gelly -
2019 Oral: Parameter-Efficient Transfer Learning for NLP »
Neil Houlsby · Andrei Giurgiu · Stanislaw Jastrzebski · Bruna Morrone · Quentin de Laroussilhe · Andrea Gesmundo · Mona Attariyan · Sylvain Gelly -
2019 Poster: A Large-Scale Study on Regularization and Normalization in GANs »
Karol Kurach · Mario Lucic · Xiaohua Zhai · Marcin Michalski · Sylvain Gelly -
2019 Oral: A Large-Scale Study on Regularization and Normalization in GANs »
Karol Kurach · Mario Lucic · Xiaohua Zhai · Marcin Michalski · Sylvain Gelly -
2019 Poster: Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations »
Francesco Locatello · Stefan Bauer · Mario Lucic · Gunnar Ratsch · Sylvain Gelly · Bernhard Schölkopf · Olivier Bachem -
2019 Poster: High-Fidelity Image Generation With Fewer Labels »
Mario Lucic · Michael Tschannen · Marvin Ritter · Xiaohua Zhai · Olivier Bachem · Sylvain Gelly -
2019 Oral: High-Fidelity Image Generation With Fewer Labels »
Mario Lucic · Michael Tschannen · Marvin Ritter · Xiaohua Zhai · Olivier Bachem · Sylvain Gelly -
2019 Oral: Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations »
Francesco Locatello · Stefan Bauer · Mario Lucic · Gunnar Ratsch · Sylvain Gelly · Bernhard Schölkopf · Olivier Bachem -
2018 Poster: Hyperbolic Entailment Cones for Learning Hierarchical Embeddings »
Octavian-Eugen Ganea · Gary Becigneul · Thomas Hofmann -
2018 Oral: Hyperbolic Entailment Cones for Learning Hierarchical Embeddings »
Octavian-Eugen Ganea · Gary Becigneul · Thomas Hofmann