Timezone: »
We introduce a new framework for sample-efficient model evaluation that we call active testing. While approaches like active learning reduce the number of labels needed for model training, existing literature largely ignores the cost of labeling test data, typically unrealistically assuming large test sets for model evaluation. This creates a disconnect to real applications, where test labels are important and just as expensive, e.g. for optimizing hyperparameters. Active testing addresses this by carefully selecting the test points to label, ensuring model evaluation is sample-efficient. To this end, we derive theoretically-grounded and intuitive acquisition strategies that are specifically tailored to the goals of active testing, noting these are distinct to those of active learning. As actively selecting labels introduces a bias; we further show how to remove this bias while reducing the variance of the estimator at the same time. Active testing is easy to implement and can be applied to any supervised machine learning method. We demonstrate its effectiveness on models including WideResNets and Gaussian processes on datasets including Fashion-MNIST and CIFAR-100.
Author Information
Jannik Kossen (University of Oxford)
Sebastian Farquhar (University of Oxford)
Yarin Gal (University of Oxford)
Tom Rainforth (University of Oxford)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Active Testing: Sample-Efficient Model Evaluation »
Thu. Jul 22nd 04:00 -- 06:00 PM Room Virtual
More from the Same Authors
-
2021 : A Practical Notation for Information-Theoretic Quantities between Outcomes and Random Variables »
Andreas Kirsch · Yarin Gal -
2021 : GoldiProx Selection: Faster training by learning what is learnable, not yet learned, and worth learning »
Sören Mindermann · Muhammed Razzak · Adrien Morisot · Aidan Gomez · Sebastian Farquhar · Jan Brauner · Yarin Gal -
2021 : Active Learning under Pool Set Distribution Shift and Noisy Data »
Andreas Kirsch · Tom Rainforth · Yarin Gal -
2021 : Batch Active Learning with Stochastic Acquisition Functions »
Andreas Kirsch · Sebastian Farquhar · Yarin Gal -
2021 : On Low Rank Training of Deep Neural Networks »
Siddhartha Kamalakara · Acyr Locatelli · Bharat Venkitesh · Jimmy Ba · Yarin Gal · Aidan Gomez -
2021 : Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data »
Andrew Jesson · Panagiotis Tigas · Joost van Amersfoort · Andreas Kirsch · Uri Shalit · Yarin Gal -
2021 : A Simple Baseline for Batch Active Learning with Stochastic Acquisition Functions »
Andreas Kirsch · Sebastian Farquhar · Yarin Gal -
2021 : Active Learning under Pool Set Distribution Shift and Noisy Data »
Andreas Kirsch · Tom Rainforth · Yarin Gal -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · Jie Ren · Joost van Amersfoort · Kehang Han · E. Kelly Buchanan · Kevin Murphy · Mark Collier · Mike Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2022 Poster: Learning Dynamics and Generalization in Deep Reinforcement Learning »
Clare Lyle · Mark Rowland · Will Dabney · Marta Kwiatkowska · Yarin Gal -
2022 Poster: Continual Learning via Sequential Function-Space Variational Inference »
Tim G. J Rudner · Freddie Bickford Smith · QIXUAN FENG · Yee-Whye Teh · Yarin Gal -
2022 Poster: Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt »
Sören Mindermann · Jan Brauner · Muhammed Razzak · Mrinank Sharma · Andreas Kirsch · Winnie Xu · Benedikt Höltgen · Aidan Gomez · Adrien Morisot · Sebastian Farquhar · Yarin Gal -
2022 Spotlight: Learning Dynamics and Generalization in Deep Reinforcement Learning »
Clare Lyle · Mark Rowland · Will Dabney · Marta Kwiatkowska · Yarin Gal -
2022 Spotlight: Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt »
Sören Mindermann · Jan Brauner · Muhammed Razzak · Mrinank Sharma · Andreas Kirsch · Winnie Xu · Benedikt Höltgen · Aidan Gomez · Adrien Morisot · Sebastian Farquhar · Yarin Gal -
2022 Spotlight: Continual Learning via Sequential Function-Space Variational Inference »
Tim G. J Rudner · Freddie Bickford Smith · QIXUAN FENG · Yee-Whye Teh · Yarin Gal -
2022 Poster: Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval »
Pascal Notin · Mafalda Dias · Jonathan Frazer · Javier Marchena Hurtado · Aidan Gomez · Debora Marks · Yarin Gal -
2022 Spotlight: Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval »
Pascal Notin · Mafalda Dias · Jonathan Frazer · Javier Marchena Hurtado · Aidan Gomez · Debora Marks · Yarin Gal -
2021 : Active Learning under Pool Set Distribution Shift and Noisy Data »
Yarin Gal · Tom Rainforth · Andreas Kirsch -
2021 : Invited Talk #1 »
Yarin Gal -
2021 : Live Panel Discussion »
Thomas Dietterich · Chelsea Finn · Kamalika Chaudhuri · Yarin Gal · Uri Shalit -
2021 Poster: Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design »
Adam Foster · Desi Ivanova · ILYAS MALIK · Tom Rainforth -
2021 Poster: On Signal-to-Noise Ratio Issues in Variational Inference for Deep Gaussian Processes »
Tim G. J. Rudner · Oscar Key · Yarin Gal · Tom Rainforth -
2021 Oral: Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design »
Adam Foster · Desi Ivanova · ILYAS MALIK · Tom Rainforth -
2021 Spotlight: On Signal-to-Noise Ratio Issues in Variational Inference for Deep Gaussian Processes »
Tim G. J. Rudner · Oscar Key · Yarin Gal · Tom Rainforth -
2021 Poster: Probabilistic Programs with Stochastic Conditioning »
David Tolpin · Yuan Zhou · Tom Rainforth · Hongseok Yang -
2021 Poster: Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding »
Andrew Jesson · Sören Mindermann · Yarin Gal · Uri Shalit -
2021 Spotlight: Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding »
Andrew Jesson · Sören Mindermann · Yarin Gal · Uri Shalit -
2021 Spotlight: Probabilistic Programs with Stochastic Conditioning »
David Tolpin · Yuan Zhou · Tom Rainforth · Hongseok Yang -
2021 Poster: PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning »
Angelos Filos · Clare Lyle · Yarin Gal · Sergey Levine · Natasha Jaques · Gregory Farquhar -
2021 Oral: PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning »
Angelos Filos · Clare Lyle · Yarin Gal · Sergey Levine · Natasha Jaques · Gregory Farquhar -
2020 : "Designing Bayesian-Optimal Experiments with Stochastic Gradients" »
Tom Rainforth -
2020 Poster: Inter-domain Deep Gaussian Processes »
Tim G. J. Rudner · Dino Sejdinovic · Yarin Gal -
2020 Poster: Divide, Conquer, and Combine: a New Inference Strategy for Probabilistic Programs with Stochastic Support »
Yuan Zhou · Hongseok Yang · Yee-Whye Teh · Tom Rainforth -
2020 Poster: Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts? »
Angelos Filos · Panagiotis Tigas · Rowan McAllister · Nicholas Rhinehart · Sergey Levine · Yarin Gal -
2020 Poster: Invariant Causal Prediction for Block MDPs »
Amy Zhang · Clare Lyle · Shagun Sodhani · Angelos Filos · Marta Kwiatkowska · Joelle Pineau · Yarin Gal · Doina Precup -
2020 Poster: Uncertainty Estimation Using a Single Deep Deterministic Neural Network »
Joost van Amersfoort · Lewis Smith · Yee-Whye Teh · Yarin Gal -
2019 Poster: Disentangling Disentanglement in Variational Autoencoders »
Emile Mathieu · Tom Rainforth · N Siddharth · Yee-Whye Teh -
2019 Oral: Disentangling Disentanglement in Variational Autoencoders »
Emile Mathieu · Tom Rainforth · N Siddharth · Yee-Whye Teh -
2019 Poster: Amortized Monte Carlo Integration »
Adam Golinski · Frank Wood · Tom Rainforth -
2019 Oral: Amortized Monte Carlo Integration »
Adam Golinski · Frank Wood · Tom Rainforth -
2018 Poster: On Nesting Monte Carlo Estimators »
Tom Rainforth · Rob Cornish · Hongseok Yang · andrew warrington · Frank Wood -
2018 Oral: On Nesting Monte Carlo Estimators »
Tom Rainforth · Rob Cornish · Hongseok Yang · andrew warrington · Frank Wood -
2018 Poster: Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam »
Mohammad Emtiyaz Khan · Didrik Nielsen · Voot Tangkaratt · Wu Lin · Yarin Gal · Akash Srivastava -
2018 Oral: Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam »
Mohammad Emtiyaz Khan · Didrik Nielsen · Voot Tangkaratt · Wu Lin · Yarin Gal · Akash Srivastava -
2018 Poster: Tighter Variational Bounds are Not Necessarily Better »
Tom Rainforth · Adam Kosiorek · Tuan Anh Le · Chris Maddison · Maximilian Igl · Frank Wood · Yee-Whye Teh -
2018 Oral: Tighter Variational Bounds are Not Necessarily Better »
Tom Rainforth · Adam Kosiorek · Tuan Anh Le · Chris Maddison · Maximilian Igl · Frank Wood · Yee-Whye Teh