Timezone: »

 
Poster
Unified Scaling Laws for Routed Language Models
Aidan Clark · Diego de Las Casas · Aurelia Guy · Arthur Mensch · Michela Paganini · Jordan Hoffmann · Bogdan Damoc · Blake Hechtman · Trevor Cai · Sebastian Borgeaud · George van den Driessche · Eliza Rutherford · Tom Hennigan · Matthew Johnson · Albin Cassirer · Chris Jones · Elena Buchatskaya · David Budden · Laurent Sifre · Simon Osindero · Oriol Vinyals · Marc'Aurelio Ranzato · Jack Rae · Erich Elsen · Koray Kavukcuoglu · Karen Simonyan

Tue Jul 19 03:30 PM -- 05:30 PM (PDT) @ Hall E #304

The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditionally use only a subset of their parameters while processing an input. For these models, parameter count and computational requirement form two independent axes along which an increase leads to better performance. In this work we derive and justify scaling laws defined on these two variables which generalize those known for standard language models and describe the performance of a wide range of routing architectures trained via three different techniques. Afterwards we provide two applications of these laws: first deriving an Effective Parameter Count along which all models scale at the same rate, and then using the scaling coefficients to give a quantitative comparison of the three routing techniques considered. Our analysis derives from an extensive evaluation of Routing Networks across five orders of magnitude of size, including models with hundreds of experts and hundreds of billions of parameters.

Author Information

Aidan Clark (OpenAI)
Diego de Las Casas (DeepMind)
Aurelia Guy (Google Inc.)
Arthur Mensch (Deepmind)
Michela Paganini (DeepMind)
Jordan Hoffmann (DeepMind)
Bogdan Damoc (DeepMind)
Blake Hechtman (Google)
Trevor Cai (DeepMind)
Sebastian Borgeaud (DeepMind)
George van den Driessche (DeepMind)
Eliza Rutherford (DeepMind)
Tom Hennigan (DeepMind)
Matthew Johnson (Google Brain)
Albin Cassirer (DeepMind)
Chris Jones (DeepMind)
Elena Buchatskaya (DeepMind)
David Budden (DeepMind)
Laurent Sifre (DeepMind)
Simon Osindero (DeepMind)
Oriol Vinyals (Google DeepMind)

Oriol Vinyals is a Research Scientist at Google. He works in deep learning with the Google Brain team. Oriol holds a Ph.D. in EECS from University of California, Berkeley, and a Masters degree from University of California, San Diego. He is a recipient of the 2011 Microsoft Research PhD Fellowship. He was an early adopter of the new deep learning wave at Berkeley, and in his thesis he focused on non-convex optimization and recurrent neural networks. At Google Brain he continues working on his areas of interest, which include artificial intelligence, with particular emphasis on machine learning, language, and vision.

Marc'Aurelio Ranzato (Deepmind)
Jack Rae (DeepMind)
Erich Elsen (Google)
Koray Kavukcuoglu (DeepMind)
Karen Simonyan (Inflection AI)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors