Timezone: »
Highly overparametrized neural networks can display curiously strong generalization performance -- a phenomenon that has recently garnered a wealth of theoretical and empirical research in order to better understand it. In contrast to most previous work, which typically considers the performance as a function of the model size, in this paper we empirically study the generalization performance as the size of the training set varies over multiple orders of magnitude. These systematic experiments lead to some interesting and potentially very useful observations; perhaps most notably that training on smaller subsets of the data can lead to more reliable model selection decisions whilst simultaneously enjoying smaller computational overheads. Our experiments furthermore allow us to estimate Minimum Description Lengths for common datasets given modern neural network architectures, thereby paving the way for principled model selection taking into account Occams-razor.
Author Information
Jorg Bornschein (DeepMind)
Francesco Visin (Deepmind)
Simon Osindero (DeepMind)
More from the Same Authors
-
2022 Poster: Retrieval-Augmented Reinforcement Learning »
Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · Adrià Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell -
2022 Spotlight: Retrieval-Augmented Reinforcement Learning »
Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · Adrià Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell -
2022 Poster: Model-Value Inconsistency as a Signal for Epistemic Uncertainty »
Angelos Filos · Eszter Vértes · Zita Marinho · Gregory Farquhar · Diana Borsa · Abe Friesen · Feryal Behbahani · Tom Schaul · Andre Barreto · Simon Osindero -
2022 Spotlight: Model-Value Inconsistency as a Signal for Epistemic Uncertainty »
Angelos Filos · Eszter Vértes · Zita Marinho · Gregory Farquhar · Diana Borsa · Abe Friesen · Feryal Behbahani · Tom Schaul · Andre Barreto · Simon Osindero -
2022 Poster: Improving Language Models by Retrieving from Trillions of Tokens »
Sebastian Borgeaud · Arthur Mensch · Jordan Hoffmann · Trevor Cai · Eliza Rutherford · Katie Millican · George van den Driessche · Jean-Baptiste Lespiau · Bogdan Damoc · Aidan Clark · Diego de Las Casas · Aurelia Guy · Jacob Menick · Roman Ring · Tom Hennigan · Saffron Huang · Loren Maggiore · Chris Jones · Albin Cassirer · Andy Brock · Michela Paganini · Geoffrey Irving · Oriol Vinyals · Simon Osindero · Karen Simonyan · Jack Rae · Erich Elsen · Laurent Sifre -
2022 Poster: Unified Scaling Laws for Routed Language Models »
Aidan Clark · Diego de Las Casas · Aurelia Guy · Arthur Mensch · Michela Paganini · Jordan Hoffmann · Bogdan Damoc · Blake Hechtman · Trevor Cai · Sebastian Borgeaud · George van den Driessche · Eliza Rutherford · Tom Hennigan · Matthew Johnson · Albin Cassirer · Chris Jones · Elena Buchatskaya · David Budden · Laurent Sifre · Simon Osindero · Oriol Vinyals · Marc'Aurelio Ranzato · Jack Rae · Erich Elsen · Koray Kavukcuoglu · Karen Simonyan -
2022 Spotlight: Improving Language Models by Retrieving from Trillions of Tokens »
Sebastian Borgeaud · Arthur Mensch · Jordan Hoffmann · Trevor Cai · Eliza Rutherford · Katie Millican · George van den Driessche · Jean-Baptiste Lespiau · Bogdan Damoc · Aidan Clark · Diego de Las Casas · Aurelia Guy · Jacob Menick · Roman Ring · Tom Hennigan · Saffron Huang · Loren Maggiore · Chris Jones · Albin Cassirer · Andy Brock · Michela Paganini · Geoffrey Irving · Oriol Vinyals · Simon Osindero · Karen Simonyan · Jack Rae · Erich Elsen · Laurent Sifre -
2022 Oral: Unified Scaling Laws for Routed Language Models »
Aidan Clark · Diego de Las Casas · Aurelia Guy · Arthur Mensch · Michela Paganini · Jordan Hoffmann · Bogdan Damoc · Blake Hechtman · Trevor Cai · Sebastian Borgeaud · George van den Driessche · Eliza Rutherford · Tom Hennigan · Matthew Johnson · Albin Cassirer · Chris Jones · Elena Buchatskaya · David Budden · Laurent Sifre · Simon Osindero · Oriol Vinyals · Marc'Aurelio Ranzato · Jack Rae · Erich Elsen · Koray Kavukcuoglu · Karen Simonyan -
2018 Poster: Mix & Match - Agent Curricula for Reinforcement Learning »
Wojciech Czarnecki · Siddhant Jayakumar · Max Jaderberg · Leonard Hasenclever · Yee Teh · Nicolas Heess · Simon Osindero · Razvan Pascanu -
2018 Oral: Mix & Match - Agent Curricula for Reinforcement Learning »
Wojciech Czarnecki · Siddhant Jayakumar · Max Jaderberg · Leonard Hasenclever · Yee Teh · Nicolas Heess · Simon Osindero · Razvan Pascanu -
2017 Poster: FeUdal Networks for Hierarchical Reinforcement Learning »
Alexander Vezhnevets · Simon Osindero · Tom Schaul · Nicolas Heess · Max Jaderberg · David Silver · Koray Kavukcuoglu -
2017 Talk: FeUdal Networks for Hierarchical Reinforcement Learning »
Alexander Vezhnevets · Simon Osindero · Tom Schaul · Nicolas Heess · Max Jaderberg · David Silver · Koray Kavukcuoglu -
2017 Poster: Decoupled Neural Interfaces using Synthetic Gradients »
Max Jaderberg · Wojciech Czarnecki · Simon Osindero · Oriol Vinyals · Alex Graves · David Silver · Koray Kavukcuoglu -
2017 Poster: Understanding Synthetic Gradients and Decoupled Neural Interfaces »
Wojciech Czarnecki · Grzegorz Świrszcz · Max Jaderberg · Simon Osindero · Oriol Vinyals · Koray Kavukcuoglu -
2017 Talk: Understanding Synthetic Gradients and Decoupled Neural Interfaces »
Wojciech Czarnecki · Grzegorz Świrszcz · Max Jaderberg · Simon Osindero · Oriol Vinyals · Koray Kavukcuoglu -
2017 Talk: Decoupled Neural Interfaces using Synthetic Gradients »
Max Jaderberg · Wojciech Czarnecki · Simon Osindero · Oriol Vinyals · Alex Graves · David Silver · Koray Kavukcuoglu