Timezone: »
Training on web-scale data can take months. But much computation and time is wasted on redundant and noisy points that are already learnt or not learnable. To accelerate training, we introduce Reducible Holdout Loss Selection (RHO-LOSS), a simple but principled technique which selects approximately those points for training that most reduce the model's generalization loss. As a result, RHO-LOSS mitigates the weaknesses of existing data selection methods: techniques from the optimization literature typically select "hard" (e.g. high loss) points, but such points are often noisy (not learnable) or less task-relevant. Conversely, curriculum learning prioritizes "easy" points, but such points need not be trained on once learned. In contrast, RHO-LOSS selects points that are learnable, worth learning, and not yet learnt. RHO-LOSS trains in far fewer steps than prior art, improves accuracy, and speeds up training on a wide range of datasets, hyperparameters, and architectures (MLPs, CNNs, and BERT). On the large web-scraped image dataset Clothing-1M, RHO-LOSS trains in 18x fewer steps and reaches 2% higher final accuracy than uniform data shuffling.
Author Information
Sören Mindermann (University of Oxford)
Jan Brauner (University of Oxford)
Muhammed Razzak (University of Oxford)

PhD Student at the University of Oxford supervised by Yarin Gal in the OATML Group.
Mrinank Sharma (University of Oxford)
Andreas Kirsch (University of Oxford)
Winnie Xu (University of Toronto)

Winnie recently graduated with an H.BSc from the University of Toronto where she majored in Computer Science and specialized in Artificial Intelligence. Her research interests span broadly in generative models with probabilistic interpretations and differentiable numerical algorithms. As an undergraduate, she researched latent variable models, variational inference, and Neural ODEs / SDEs with David Duvenaud. She is currently a student researcher at Google Brain collaborating with Stanford University where she is working on efficient methods for training diffusion models and doing Bayesian program induction with large language models in reasoning tasks. In the recent past, she has also collaborated with Nvidia Research, Oxford (OATML), and Cohere AI on topics in robotics, large language models, and NLP.
Benedikt Höltgen (University of Oxford)
Aidan Gomez (Google)
Adrien Morisot (Cohere)
Adrien is a representation learning person at Cohere. He enjoys reading, hiking, and talking about himself in the third person.
Sebastian Farquhar (University of Oxford)
Yarin Gal (University of Oxford)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt »
Thu. Jul 21st 03:50 -- 03:55 PM Room Ballroom 1 & 2
More from the Same Authors
-
2021 : GoldiProx Selection: Faster training by learning what is learnable, not yet learned, and worth learning »
Sören Mindermann · Muhammed Razzak · Adrien Morisot · Aidan Gomez · Sebastian Farquhar · Jan Brauner · Yarin Gal -
2021 : On Low Rank Training of Deep Neural Networks »
Siddhartha Kamalakara · Acyr Locatelli · Bharat Venkitesh · Jimmy Ba · Yarin Gal · Aidan Gomez -
2022 : P23: Language Model Cascades »
David Dohan · Winnie Xu -
2022 : [Poster] Self–Similarity Priors: Neural Collages as Differentiable Fractal Representations »
Winnie Xu -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2023 : Early Experiments in Scalable Dataset Selection for Self-Supervised Learning in Geospatial Imagery Models »
Muhammed Razzak · Anthony Ortiz · Caleb Robinson -
2023 : Black-Box Batch Active Learning for Regression »
Andreas Kirsch -
2023 : BatchGFN: Generative Flow Networks for Batch Active Learning »
Shreshth Malik · Salem Lahlou · Andrew Jesson · Moksh Jain · Nikolay Malkin · Tristan Deleu · Yoshua Bengio · Yarin Gal -
2023 : CLAM: Selective Clarification for Ambiguous Questions with Generative Language Models »
Lorenz Kuhn · Yarin Gal · Sebastian Farquhar -
2023 Poster: DiscoBAX - Discovery of optimal intervention sets in genomic experiment design »
Clare Lyle · Arash Mehrjou · Pascal Notin · Andrew Jesson · Stefan Bauer · Yarin Gal · Patrick Schwab -
2023 Poster: Differentiable Multi-Target Causal Bayesian Experimental Design »
Panagiotis Tigas · Yashas Annadani · Desi Ivanova · Andrew Jesson · Yarin Gal · Adam Foster · Stefan Bauer -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2022 : Contributed Spotlight Talks: Part 1 »
David Dohan · Winnie Xu · Sugandha Sharma · Tan Zhi-Xuan -
2022 Poster: Learning Dynamics and Generalization in Deep Reinforcement Learning »
Clare Lyle · Mark Rowland · Will Dabney · Marta Kwiatkowska · Yarin Gal -
2022 Spotlight: Learning Dynamics and Generalization in Deep Reinforcement Learning »
Clare Lyle · Mark Rowland · Will Dabney · Marta Kwiatkowska · Yarin Gal -
2022 : Poster Session 2 »
Asra Aslam · Sowmya Vijayakumar · Heta Gandhi · Mary Adewunmi · You Cheng · Tong Yang · Kristina Ulicna · · Weiwei Zong · Narmada Naik · Akshata Tiwari · Ambreen Hamadani · Mayuree Binjolkar · Charupriya Sharma · Chhavi Yadav · Yu Yang · Winnie Xu · QINGQING ZHAO · Julissa Giuliana Villanueva Llerena · Lilian Mkonyi · Berthine Nyunga Mpinda · Rehema Mwawado · Tooba Imtiaz · Desi Ivanova · Emma Johanna Mikaela Petersson Svensson · Angela Bitto-Nemling · Elisabeth Rumetshofer · Ana Sanchez Fernandez · Garima Giri · Sigrid Passano Hellan · Catherine Ordun · Vasiliki Tassopoulou · Gina Wong -
2022 : Poster Session 1 »
Asra Aslam · Sowmya Vijayakumar · Heta Gandhi · Mary Adewunmi · You Cheng · Tong Yang · Kristina Ulicna · · Weiwei Zong · Narmada Naik · Akshata Tiwari · Ambreen Hamadani · Mayuree Binjolkar · Charupriya Sharma · Chhavi Yadav · Yu Yang · Winnie Xu · QINGQING ZHAO · Julissa Giuliana Villanueva Llerena · Lilian Mkonyi · Berthine Nyunga Mpinda · Rehema Mwawado · Tooba Imtiaz · Desi Ivanova · Emma Johanna Mikaela Petersson Svensson · Angela Bitto-Nemling · Elisabeth Rumetshofer · Ana Sanchez Fernandez · Garima Giri · Sigrid Passano Hellan · Catherine Ordun · Vasiliki Tassopoulou · Gina Wong -
2021 Poster: Active Testing: Sample-Efficient Model Evaluation »
Jannik Kossen · Sebastian Farquhar · Yarin Gal · Tom Rainforth -
2021 Spotlight: Active Testing: Sample-Efficient Model Evaluation »
Jannik Kossen · Sebastian Farquhar · Yarin Gal · Tom Rainforth -
2018 Poster: Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam »
Mohammad Emtiyaz Khan · Didrik Nielsen · Voot Tangkaratt · Wu Lin · Yarin Gal · Akash Srivastava -
2018 Oral: Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam »
Mohammad Emtiyaz Khan · Didrik Nielsen · Voot Tangkaratt · Wu Lin · Yarin Gal · Akash Srivastava