Timezone: »
Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models. Our goal is to apply NAS to search for a better alternative to the Transformer. We first construct a large search space inspired by the recent advances in feed-forward sequence models and then run evolutionary architecture search with warm starting by seeding our initial population with the Transformer. To directly search on the computationally expensive WMT 2014 English-German translation task, we develop the Progressive Dynamic Hurdles method, which allows us to dynamically allocate more resources to more promising candidate models. The architecture found in our experiments -- the Evolved Transformer -- demonstrates consistent improvement over the Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French, WMT 2014 English-Czech and LM1B. At a big model size, the Evolved Transformer establishes a new state-of-the-art BLEU score of 29.8 on WMT'14 English-German; at smaller sizes, it achieves the same quality as the original "big" Transformer with 37.6% less parameters and outperforms the Transformer by 0.7 BLEU at a mobile-friendly model size of 7M parameters.
Author Information
David So (Google Brain)
Quoc Le (Google Brain)
Chen Liang (Google Brain)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: The Evolved Transformer »
Wed Jun 12th 12:05 -- 12:10 AM Room Hall A
More from the Same Authors
-
2020 Poster: Go Wide, Then Narrow: Efficient Training of Deep Thin Networks »
Denny Zhou · Mao Ye · Chen Chen · Tianjian Meng · Mingxing Tan · Xiaodan Song · Quoc Le · Qiang Liu · Dale Schuurmans -
2020 Poster: AutoML-Zero: Evolving Machine Learning Algorithms From Scratch »
Esteban Real · Chen Liang · David So · Quoc Le -
2019 Poster: Learning to Generalize from Sparse and Underspecified Rewards »
Rishabh Agarwal · Chen Liang · Dale Schuurmans · Mohammad Norouzi -
2019 Oral: Learning to Generalize from Sparse and Underspecified Rewards »
Rishabh Agarwal · Chen Liang · Dale Schuurmans · Mohammad Norouzi -
2019 Poster: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks »
Mingxing Tan · Quoc Le -
2019 Poster: The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study »
Daniel Park · Jascha Sohl-Dickstein · Quoc Le · Samuel Smith -
2019 Oral: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks »
Mingxing Tan · Quoc Le -
2019 Oral: The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study »
Daniel Park · Jascha Sohl-Dickstein · Quoc Le · Samuel Smith -
2018 Poster: Understanding and Simplifying One-Shot Architecture Search »
Gabriel Bender · Pieter-Jan Kindermans · Barret Zoph · Vijay Vasudevan · Quoc Le -
2018 Poster: Learning Longer-term Dependencies in RNNs with Auxiliary Losses »
Trieu H Trinh · Andrew Dai · Thang Luong · Quoc Le -
2018 Oral: Learning Longer-term Dependencies in RNNs with Auxiliary Losses »
Trieu H Trinh · Andrew Dai · Thang Luong · Quoc Le -
2018 Oral: Understanding and Simplifying One-Shot Architecture Search »
Gabriel Bender · Pieter-Jan Kindermans · Barret Zoph · Vijay Vasudevan · Quoc Le -
2018 Poster: Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games? »
Maithra Raghu · Alexander Irpan · Jacob Andreas · Bobby Kleinberg · Quoc Le · Jon Kleinberg -
2018 Oral: Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games? »
Maithra Raghu · Alexander Irpan · Jacob Andreas · Bobby Kleinberg · Quoc Le · Jon Kleinberg -
2018 Poster: Efficient Neural Architecture Search via Parameters Sharing »
Hieu Pham · Melody Guan · Barret Zoph · Quoc Le · Jeff Dean -
2018 Oral: Efficient Neural Architecture Search via Parameters Sharing »
Hieu Pham · Melody Guan · Barret Zoph · Quoc Le · Jeff Dean -
2017 Poster: Large-Scale Evolution of Image Classifiers »
Esteban Real · Sherry Moore · Andrew Selle · Saurabh Saxena · Yutaka Leon Suematsu · Jie Tan · Quoc Le · Alexey Kurakin -
2017 Poster: Neural Optimizer Search using Reinforcement Learning »
Irwan Bello · Barret Zoph · Vijay Vasudevan · Quoc Le -
2017 Poster: Device Placement Optimization with Reinforcement Learning »
Azalia Mirhoseini · Hieu Pham · Quoc Le · benoit steiner · Mohammad Norouzi · Rasmus Larsen · Yuefeng Zhou · Naveen Kumar · Samy Bengio · Jeff Dean -
2017 Talk: Neural Optimizer Search using Reinforcement Learning »
Irwan Bello · Barret Zoph · Vijay Vasudevan · Quoc Le -
2017 Talk: Large-Scale Evolution of Image Classifiers »
Esteban Real · Sherry Moore · Andrew Selle · Saurabh Saxena · Yutaka Leon Suematsu · Jie Tan · Quoc Le · Alexey Kurakin -
2017 Talk: Device Placement Optimization with Reinforcement Learning »
Azalia Mirhoseini · Hieu Pham · Quoc Le · benoit steiner · Mohammad Norouzi · Rasmus Larsen · Yuefeng Zhou · Naveen Kumar · Samy Bengio · Jeff Dean