Timezone: »

Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt
Sören Mindermann · Jan Brauner · Muhammed Razzak · Mrinank Sharma · Andreas Kirsch · Winnie Xu · Benedikt Höltgen · Aidan Gomez · Adrien Morisot · Sebastian Farquhar · Yarin Gal

Thu Jul 21 08:50 AM -- 08:55 AM (PDT) @ Ballroom 1 & 2

Training on web-scale data can take months. But much computation and time is wasted on redundant and noisy points that are already learnt or not learnable. To accelerate training, we introduce Reducible Holdout Loss Selection (RHO-LOSS), a simple but principled technique which selects approximately those points for training that most reduce the model's generalization loss. As a result, RHO-LOSS mitigates the weaknesses of existing data selection methods: techniques from the optimization literature typically select "hard" (e.g. high loss) points, but such points are often noisy (not learnable) or less task-relevant. Conversely, curriculum learning prioritizes "easy" points, but such points need not be trained on once learned. In contrast, RHO-LOSS selects points that are learnable, worth learning, and not yet learnt. RHO-LOSS trains in far fewer steps than prior art, improves accuracy, and speeds up training on a wide range of datasets, hyperparameters, and architectures (MLPs, CNNs, and BERT). On the large web-scraped image dataset Clothing-1M, RHO-LOSS trains in 18x fewer steps and reaches 2% higher final accuracy than uniform data shuffling.

Author Information

Sören Mindermann (University of Oxford)
Jan Brauner (University of Oxford)
Muhammed Razzak (University of Oxford)
Muhammed Razzak

PhD Student at the University of Oxford supervised by Yarin Gal in the OATML Group.

Mrinank Sharma (University of Oxford)
Andreas Kirsch (University of Oxford)
Winnie Xu (University of Toronto)
Winnie Xu

Winnie recently graduated with an H.BSc from the University of Toronto where she majored in Computer Science and specialized in Artificial Intelligence. Her research interests span broadly in generative models with probabilistic interpretations and differentiable numerical algorithms. As an undergraduate, she researched latent variable models, variational inference, and Neural ODEs / SDEs with David Duvenaud. She is currently a student researcher at Google Brain collaborating with Stanford University where she is working on efficient methods for training diffusion models and doing Bayesian program induction with large language models in reasoning tasks. In the recent past, she has also collaborated with Nvidia Research, Oxford (OATML), and Cohere AI on topics in robotics, large language models, and NLP.

Benedikt Höltgen (University of Oxford)
Aidan Gomez (Google)
Adrien Morisot (Cohere)

Adrien is a representation learning person at Cohere. He enjoys reading, hiking, and talking about himself in the third person.

Sebastian Farquhar (University of Oxford)
Yarin Gal (University of Oxford)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors