The Evolved Transformer
David So · Quoc Le · Chen Liang

Tue Jun 11th 05:05 -- 05:10 PM @ Hall A

Recent works have highlighted the strengths of the Transformer architecture for dealing with sequence tasks. At the same time, neural architecture search has advanced to the point where it can outperform human-designed models. The goal of this work is to use neural architecture search to design a better Transformer architecture. We first construct a large search space inspired by the recent advances in feed-forward sequential models and then run evolutionary architecture search, seeding our initial population with the Transformer. To effectively run this search on the computationally expensive WMT 2014 English-German translation task, we develop the progressive dynamic hurdles (PDH) method, which allows us to dynamically allocate more resources to more promising candidate models. The architecture found in our experiments - the Evolved Transformer (ET) - demonstrates consistent improvement over the Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French, WMT 2014 English-Czech and LM1B. At big model size, the Evolved Transformer is twice as efficient as the Transformer in terms of FLOPS without loss in quality. At a much smaller – mobile-friendly – model size of ~7M parameters, the Evolved Transformer outperforms the Transformer by 0.8 BLEU on WMT’14 English-German.

Author Information

David So (Google Brain)
Quoc Le (Google Brain)
Chen Liang (Google Brain)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors