Timezone: »
Recently, substantial progress has been made in language modeling by using deep neural networks. However, in practice, large scale neural language models have been shown to be prone to overfitting. In this paper, we present a simple yet highly effective adversarial training mechanism for regularizing neural language models. The idea is to introduce adversarial noise to the output embedding layer while training the models. We show that the optimal adversarial noise yields a simple closed form solution, thus allowing us to develop a simple and time efficient algorithm. Theoretically, we show that our adversarial mechanism effectively encourages the diversity of the embedding vectors, helping to increase the robustness of models. Empirically, we show that our method improves on the single model state-of-the-art results for language modeling on Penn Treebank (PTB) and Wikitext-2, achieving test perplexity scores of 46.42 and 38.65, respectively. When applied to machine translation, our method improves over various transformer-based translation baselines in BLEU scores on the WMT14 English-German and IWSLT14 German-English tasks.
Author Information
Dilin Wang (UT Austin)
Chengyue Gong (university of texas at austin)
Qiang Liu (UT Austin)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Improving Neural Language Modeling via Adversarial Training »
Fri Jun 14th 01:30 -- 04:00 AM Room Pacific Ballroom
More from the Same Authors
-
2020 Poster: Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection »
Mao Ye · Chengyue Gong · Lizhen Nie · Denny Zhou · Adam Klivans · Qiang Liu -
2020 Poster: Go Wide, Then Narrow: Efficient Training of Deep Thin Networks »
Denny Zhou · Mao Ye · Chen Chen · Tianjian Meng · Mingxing Tan · Xiaodan Song · Quoc Le · Qiang Liu · Dale Schuurmans -
2020 Poster: Accountable Off-Policy Evaluation With Kernel Bellman Statistics »
Yihao Feng · Tongzheng Ren · Ziyang Tang · Qiang Liu -
2020 Poster: A Chance-Constrained Generative Framework for Sequence Optimization »
Xianggen Liu · Qiang Liu · Sen Song · Jian Peng -
2019 Workshop: Stein’s Method for Machine Learning and Statistics »
Francois-Xavier Briol · Lester Mackey · Chris Oates · Qiang Liu · Larry Goldstein · Larry Goldstein -
2019 Poster: Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization »
Chengyue Gong · Jian Peng · Qiang Liu -
2019 Poster: Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models »
Dilin Wang · Qiang Liu -
2019 Oral: Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization »
Chengyue Gong · Jian Peng · Qiang Liu -
2019 Oral: Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models »
Dilin Wang · Qiang Liu -
2018 Poster: Learning to Explore via Meta-Policy Gradient »
Tianbing Xu · Qiang Liu · Liang Zhao · Jian Peng -
2018 Poster: Stein Variational Gradient Descent Without Gradient »
Jun Han · Qiang Liu -
2018 Oral: Stein Variational Gradient Descent Without Gradient »
Jun Han · Qiang Liu -
2018 Oral: Learning to Explore via Meta-Policy Gradient »
Tianbing Xu · Qiang Liu · Liang Zhao · Jian Peng -
2018 Poster: Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy »
Jiasen Yang · Qiang Liu · Vinayak A Rao · Jennifer Neville -
2018 Poster: Stein Variational Message Passing for Continuous Graphical Models »
Dilin Wang · Zhe Zeng · Qiang Liu -
2018 Oral: Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy »
Jiasen Yang · Qiang Liu · Vinayak A Rao · Jennifer Neville -
2018 Oral: Stein Variational Message Passing for Continuous Graphical Models »
Dilin Wang · Zhe Zeng · Qiang Liu