Timezone: »
Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models. Our goal is to apply NAS to search for a better alternative to the Transformer. We first construct a large search space inspired by the recent advances in feed-forward sequence models and then run evolutionary architecture search with warm starting by seeding our initial population with the Transformer. To directly search on the computationally expensive WMT 2014 English-German translation task, we develop the Progressive Dynamic Hurdles method, which allows us to dynamically allocate more resources to more promising candidate models. The architecture found in our experiments -- the Evolved Transformer -- demonstrates consistent improvement over the Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French, WMT 2014 English-Czech and LM1B. At a big model size, the Evolved Transformer establishes a new state-of-the-art BLEU score of 29.8 on WMT'14 English-German; at smaller sizes, it achieves the same quality as the original "big" Transformer with 37.6% less parameters and outperforms the Transformer by 0.7 BLEU at a mobile-friendly model size of 7M parameters.
Author Information
David So (Google Brain)
Quoc Le (Google Brain)
Chen Liang (Google Brain)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: The Evolved Transformer »
Wed. Jun 12th 12:05 -- 12:10 AM Room Hall A
More from the Same Authors
-
2023 Poster: The Flan Collection: Designing Data and Methods for Effective Instruction Tuning »
Shayne Longpre · Le Hou · Tu Vu · Albert Webson · Hyung Won Chung · Yi Tay · Denny Zhou · Quoc Le · Barret Zoph · Jason Wei · Adam Roberts -
2023 Poster: Brainformers: Trading Simplicity for Efficiency »
Yanqi Zhou · Nan Du · Yanping Huang · Daiyi Peng · Chang Lan · Da Huang · Siamak Shakeri · David So · Andrew Dai · Yifeng Lu · Zhifeng Chen · Quoc Le · Claire Cui · James Laudon · Jeff Dean -
2022 Poster: Transformer Quality in Linear Time »
Weizhe Hua · Zihang Dai · Hanxiao Liu · Quoc Le -
2022 Poster: GLaM: Efficient Scaling of Language Models with Mixture-of-Experts »
Nan Du · Yanping Huang · Andrew Dai · Simon Tong · Dmitry Lepikhin · Yuanzhong Xu · Maxim Krikun · Yanqi Zhou · Adams Wei Yu · Orhan Firat · Barret Zoph · William Fedus · Maarten Bosma · Zongwei Zhou · Tao Wang · Emma Wang · Kellie Webster · Marie Pellat · Kevin Robinson · Kathleen Meier-Hellstern · Toju Duke · Lucas Dixon · Kun Zhang · Quoc Le · Yonghui Wu · Zhifeng Chen · Claire Cui -
2022 Spotlight: GLaM: Efficient Scaling of Language Models with Mixture-of-Experts »
Nan Du · Yanping Huang · Andrew Dai · Simon Tong · Dmitry Lepikhin · Yuanzhong Xu · Maxim Krikun · Yanqi Zhou · Adams Wei Yu · Orhan Firat · Barret Zoph · William Fedus · Maarten Bosma · Zongwei Zhou · Tao Wang · Emma Wang · Kellie Webster · Marie Pellat · Kevin Robinson · Kathleen Meier-Hellstern · Toju Duke · Lucas Dixon · Kun Zhang · Quoc Le · Yonghui Wu · Zhifeng Chen · Claire Cui -
2022 Spotlight: Transformer Quality in Linear Time »
Weizhe Hua · Zihang Dai · Hanxiao Liu · Quoc Le -
2021 Poster: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision »
Chao Jia · Yinfei Yang · Ye Xia · Yi-Ting Chen · Zarana Parekh · Hieu Pham · Quoc Le · Yun-Hsuan Sung · Zhen Li · Tom Duerig -
2021 Oral: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision »
Chao Jia · Yinfei Yang · Ye Xia · Yi-Ting Chen · Zarana Parekh · Hieu Pham · Quoc Le · Yun-Hsuan Sung · Zhen Li · Tom Duerig -
2021 Poster: EfficientNetV2: Smaller Models and Faster Training »
Mingxing Tan · Quoc Le -
2021 Poster: Towards Domain-Agnostic Contrastive Learning »
Vikas Verma · Thang Luong · Kenji Kawaguchi · Hieu Pham · Quoc Le -
2021 Spotlight: EfficientNetV2: Smaller Models and Faster Training »
Mingxing Tan · Quoc Le -
2021 Spotlight: Towards Domain-Agnostic Contrastive Learning »
Vikas Verma · Thang Luong · Kenji Kawaguchi · Hieu Pham · Quoc Le -
2020 Poster: Go Wide, Then Narrow: Efficient Training of Deep Thin Networks »
Denny Zhou · Mao Ye · Chen Chen · Tianjian Meng · Mingxing Tan · Xiaodan Song · Quoc Le · Qiang Liu · Dale Schuurmans -
2020 Poster: AutoML-Zero: Evolving Machine Learning Algorithms From Scratch »
Esteban Real · Chen Liang · David So · Quoc Le -
2019 Poster: Learning to Generalize from Sparse and Underspecified Rewards »
Rishabh Agarwal · Chen Liang · Dale Schuurmans · Mohammad Norouzi -
2019 Oral: Learning to Generalize from Sparse and Underspecified Rewards »
Rishabh Agarwal · Chen Liang · Dale Schuurmans · Mohammad Norouzi -
2019 Poster: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks »
Mingxing Tan · Quoc Le -
2019 Poster: The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study »
Daniel Park · Jascha Sohl-Dickstein · Quoc Le · Samuel Smith -
2019 Oral: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks »
Mingxing Tan · Quoc Le -
2019 Oral: The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study »
Daniel Park · Jascha Sohl-Dickstein · Quoc Le · Samuel Smith -
2018 Poster: Understanding and Simplifying One-Shot Architecture Search »
Gabriel Bender · Pieter-Jan Kindermans · Barret Zoph · Vijay Vasudevan · Quoc Le -
2018 Poster: Learning Longer-term Dependencies in RNNs with Auxiliary Losses »
Trieu H Trinh · Andrew Dai · Thang Luong · Quoc Le -
2018 Oral: Learning Longer-term Dependencies in RNNs with Auxiliary Losses »
Trieu H Trinh · Andrew Dai · Thang Luong · Quoc Le -
2018 Oral: Understanding and Simplifying One-Shot Architecture Search »
Gabriel Bender · Pieter-Jan Kindermans · Barret Zoph · Vijay Vasudevan · Quoc Le -
2018 Poster: Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games? »
Maithra Raghu · Alexander Irpan · Jacob Andreas · Bobby Kleinberg · Quoc Le · Jon Kleinberg -
2018 Oral: Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games? »
Maithra Raghu · Alexander Irpan · Jacob Andreas · Bobby Kleinberg · Quoc Le · Jon Kleinberg -
2018 Poster: Efficient Neural Architecture Search via Parameters Sharing »
Hieu Pham · Melody Guan · Barret Zoph · Quoc Le · Jeff Dean -
2018 Oral: Efficient Neural Architecture Search via Parameters Sharing »
Hieu Pham · Melody Guan · Barret Zoph · Quoc Le · Jeff Dean -
2017 Poster: Large-Scale Evolution of Image Classifiers »
Esteban Real · Sherry Moore · Andrew Selle · Saurabh Saxena · Yutaka Leon Suematsu · Jie Tan · Quoc Le · Alexey Kurakin -
2017 Poster: Neural Optimizer Search using Reinforcement Learning »
Irwan Bello · Barret Zoph · Vijay Vasudevan · Quoc Le -
2017 Poster: Device Placement Optimization with Reinforcement Learning »
Azalia Mirhoseini · Hieu Pham · Quoc Le · benoit steiner · Mohammad Norouzi · Rasmus Larsen · Yuefeng Zhou · Naveen Kumar · Samy Bengio · Jeff Dean -
2017 Talk: Neural Optimizer Search using Reinforcement Learning »
Irwan Bello · Barret Zoph · Vijay Vasudevan · Quoc Le -
2017 Talk: Large-Scale Evolution of Image Classifiers »
Esteban Real · Sherry Moore · Andrew Selle · Saurabh Saxena · Yutaka Leon Suematsu · Jie Tan · Quoc Le · Alexey Kurakin -
2017 Talk: Device Placement Optimization with Reinforcement Learning »
Azalia Mirhoseini · Hieu Pham · Quoc Le · benoit steiner · Mohammad Norouzi · Rasmus Larsen · Yuefeng Zhou · Naveen Kumar · Samy Bengio · Jeff Dean