Timezone: »

 
Poster
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Nan Du · Yanping Huang · Andrew Dai · Simon Tong · Dmitry Lepikhin · Yuanzhong Xu · Maxim Krikun · Yanqi Zhou · Adams Wei Yu · Orhan Firat · Barret Zoph · William Fedus · Maarten Bosma · Zongwei Zhou · Tao Wang · Emma Wang · Kellie Webster · Marie Pellat · Kevin Robinson · Kathleen Meier-Hellstern · Toju Duke · Lucas Dixon · Kun Zhang · Quoc Le · Yonghui Wu · Zhifeng Chen · Claire Cui

Thu Jul 21 03:00 PM -- 05:00 PM (PDT) @ Hall E #208

Scaling language models with more data, compute and parameters has driven significant progress in natural language processing. For example, thanks to scaling, GPT-3 was able to achieve strong results on in-context learning tasks. However, training these large dense models requires significant amounts of computing resources. In this paper, we propose and develop a family of language models named \glam (\textbf{G}eneralist \textbf{La}nguage \textbf{M}odel), which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants. The largest \glam has 1.2 trillion parameters, which is approximately 7x larger than GPT-3. It consumes only 1/3 of the energy used to train GPT-3 and requires half of the computation flops for inference, while still achieving better overall fewshot performance across 29 NLP tasks.

Author Information

Nan Du (Google)
Yanping Huang (Google Brain)
Andrew Dai (Google)

Andrew Dai was awarded an MA in Computer Science at the University of Cambridge before receiving a PhD in Informatics at the University of Edinburgh for text modeling with Bayesian nonparametrics. He then subsequently worked at Google in Mountain View, California in a range of teams including machine translation, Google Now and Google Ads. In 2014, he joined the Google Brain team where he has worked on text representations, semi-supervised learning, sequence models, adversarial training and deep learning on medical data.

Simon Tong (Google Brain)
Dmitry Lepikhin (Google)
Yuanzhong Xu (Google)
Maxim Krikun (Google)
Yanqi Zhou (Google)
Adams Wei Yu (Google Brain)
Orhan Firat (Google)
Barret Zoph (Google)
William Fedus (Google Brain)
Maarten Bosma (Google)
Zongwei Zhou (Google Inc.)
Tao Wang (Google Inc.)
Emma Wang (Google)
Kellie Webster (Google)
Marie Pellat (Google)
Kevin Robinson (Google)
Kathleen Meier-Hellstern (Google)

Kathy is a Principal Engineer and Director in Google Research, serving as the Responsible AI Tech Lead for Google’s large language and multimodal models. Her research mission is to create scalable tools, data and processes for evaluating and improving RAI in ML Models and Products. Kathy was previously a Principal Site Reliability Engineer at Google, focused on improving the end-to-end client experience in YouTube and Ads. Before joining Google, Kathy was Assistant Vice President of Optimization, Reliability & Customer Analytics (ORCA) in AT&T Labs, responsible for delivering enhanced analytic tools and software for AT&T’s Next Generation networks. Kathy is an AT&T Fellow, and holds a Ph.D. and Master’s degree in Operations Research from University of Delaware.

Toju Duke (Google)
Lucas Dixon (Google)
Kun Zhang (Google)
Quoc Le (Google Brain)
Yonghui Wu (Google)
Zhifeng Chen (Google)
Claire Cui (Google)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors