Timezone: »
We show that the gradient estimates used in training Deep Gaussian Processes (DGPs) with importance-weighted variational inference are susceptible to signal-to-noise ratio (SNR) issues. Specifically, we show both theoretically and via an extensive empirical evaluation that the SNR of the gradient estimates for the latent variable's variational parameters decreases as the number of importance samples increases. As a result, these gradient estimates degrade to pure noise if the number of importance samples is too large. To address this pathology, we show how doubly-reparameterized gradient estimators, originally proposed for training variational autoencoders, can be adapted to the DGP setting and that the resultant estimators completely remedy the SNR issue, thereby providing more reliable training. Finally, we demonstrate that our fix can lead to consistent improvements in the predictive performance of DGP models.
Author Information
Tim G. J. Rudner (University of Oxford)
I am a PhD Candidate in the Department of Computer Science at the University of Oxford, where I conduct research on probabilistic machine learning with Yarin Gal and Yee Whye Teh. My research interests span **Bayesian deep learning**, **variational inference**, and **reinforcement learning**. I am particularly interested in uncertainty quantification in deep learning, reinforcement learning as probabilistic inference, and probabilistic transfer learning. I am also a **Rhodes Scholar** and an **AI Fellow** at Georgetown University's Center for Security and Emerging Technology.
Oscar Key (UCL)
Yarin Gal (University of Oxford)
Tom Rainforth (University of Oxford)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: On Signal-to-Noise Ratio Issues in Variational Inference for Deep Gaussian Processes »
Thu. Jul 22nd 04:00 -- 06:00 PM Room Virtual
More from the Same Authors
-
2021 : A Practical Notation for Information-Theoretic Quantities between Outcomes and Random Variables »
Andreas Kirsch · Yarin Gal -
2021 : GoldiProx Selection: Faster training by learning what is learnable, not yet learned, and worth learning »
Sören Mindermann · Muhammed Razzak · Adrien Morisot · Aidan Gomez · Sebastian Farquhar · Jan Brauner · Yarin Gal -
2021 : Continual Learning via Function-Space Variational Inference: A Unifying View »
Tim G. J. Rudner · Freddie Bickford Smith · Qixuan Feng · Yee-Whye Teh · Yarin Gal -
2021 : Active Learning under Pool Set Distribution Shift and Noisy Data »
Andreas Kirsch · Tom Rainforth · Yarin Gal -
2021 : Batch Active Learning with Stochastic Acquisition Functions »
Andreas Kirsch · Sebastian Farquhar · Yarin Gal -
2021 : On Low Rank Training of Deep Neural Networks »
Siddhartha Kamalakara · Acyr Locatelli · Bharat Venkitesh · Jimmy Ba · Yarin Gal · Aidan Gomez -
2021 : Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data »
Andrew Jesson · Panagiotis Tigas · Joost van Amersfoort · Andreas Kirsch · Uri Shalit · Yarin Gal -
2021 : A Simple Baseline for Batch Active Learning with Stochastic Acquisition Functions »
Andreas Kirsch · Sebastian Farquhar · Yarin Gal -
2021 : Active Learning under Pool Set Distribution Shift and Noisy Data »
Andreas Kirsch · Tom Rainforth · Yarin Gal -
2022 : Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations »
Cong Lu · Philip Ball · Tim G. J Rudner · Jack Parker-Holder · Michael A Osborne · Yee-Whye Teh -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · Jie Ren · Joost van Amersfoort · Kehang Han · E. Kelly Buchanan · Kevin Murphy · Mark Collier · Mike Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2023 : BatchGFN: Generative Flow Networks for Batch Active Learning »
Shreshth Malik · Salem Lahlou · Andrew Jesson · Moksh Jain · Nikolay Malkin · Tristan Deleu · Yoshua Bengio · Yarin Gal -
2023 : CLAM: Selective Clarification for Ambiguous Questions with Generative Language Models »
Lorenz Kuhn · Yarin Gal · Sebastian Farquhar -
2023 : Protein Design with Guided Discrete Diffusion »
Nate Gruver · Samuel Stanton · Nathan Frey · Tim G. J. Rudner · Isidro Hotzel · Julien Lafrance-Vanasse · Arvind Rajpal · Kyunghyun Cho · Andrew Wilson -
2023 Poster: DiscoBAX - Discovery of optimal intervention sets in genomic experiment design »
Clare Lyle · Arash Mehrjou · Pascal Notin · Andrew Jesson · Stefan Bauer · Yarin Gal · Patrick Schwab -
2023 Poster: Learning Instance-Specific Augmentations by Capturing Local Invariances »
Ning Miao · Tom Rainforth · Emile Mathieu · Yann Dubois · Yee-Whye Teh · Adam Foster · Hyunjik Kim -
2023 Poster: Optimally-weighted Estimators of the Maximum Mean Discrepancy for Likelihood-Free Inference »
Ayush Bharti · Masha Naslidnyk · Oscar Key · Samuel Kaski · Francois-Xavier Briol -
2023 Poster: CO-BED: Information-Theoretic Contextual Optimization via Bayesian Experimental Design »
Desi Ivanova · Joel Jennings · Tom Rainforth · Cheng Zhang · Adam Foster -
2023 Poster: Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions »
Leo Klarner · Tim G. J. Rudner · Michael Reutlinger · Torsten Schindler · Garrett Morris · Charlotte Deane · Yee-Whye Teh -
2023 Poster: Differentiable Multi-Target Causal Bayesian Experimental Design »
Panagiotis Tigas · Yashas Annadani · Desi Ivanova · Andrew Jesson · Yarin Gal · Adam Foster · Stefan Bauer -
2023 Poster: Function-Space Regularization in Neural Networks: A Probabilistic Perspective »
Tim G. J. Rudner · Sanyam Kapoor · Shikai Qiu · Andrew Wilson -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2022 Poster: Learning Dynamics and Generalization in Deep Reinforcement Learning »
Clare Lyle · Mark Rowland · Will Dabney · Marta Kwiatkowska · Yarin Gal -
2022 Poster: Continual Learning via Sequential Function-Space Variational Inference »
Tim G. J Rudner · Freddie Bickford Smith · QIXUAN FENG · Yee-Whye Teh · Yarin Gal -
2022 Poster: Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt »
Sören Mindermann · Jan Brauner · Muhammed Razzak · Mrinank Sharma · Andreas Kirsch · Winnie Xu · Benedikt Höltgen · Aidan Gomez · Adrien Morisot · Sebastian Farquhar · Yarin Gal -
2022 Spotlight: Learning Dynamics and Generalization in Deep Reinforcement Learning »
Clare Lyle · Mark Rowland · Will Dabney · Marta Kwiatkowska · Yarin Gal -
2022 Spotlight: Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt »
Sören Mindermann · Jan Brauner · Muhammed Razzak · Mrinank Sharma · Andreas Kirsch · Winnie Xu · Benedikt Höltgen · Aidan Gomez · Adrien Morisot · Sebastian Farquhar · Yarin Gal -
2022 Spotlight: Continual Learning via Sequential Function-Space Variational Inference »
Tim G. J Rudner · Freddie Bickford Smith · QIXUAN FENG · Yee-Whye Teh · Yarin Gal -
2022 Poster: Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval »
Pascal Notin · Mafalda Dias · Jonathan Frazer · Javier Marchena Hurtado · Aidan Gomez · Debora Marks · Yarin Gal -
2022 Spotlight: Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval »
Pascal Notin · Mafalda Dias · Jonathan Frazer · Javier Marchena Hurtado · Aidan Gomez · Debora Marks · Yarin Gal -
2021 : Active Learning under Pool Set Distribution Shift and Noisy Data »
Yarin Gal · Tom Rainforth · Andreas Kirsch -
2021 : Continual Learning via Function-Space Variational Inference: A Unifying View »
Yarin Gal · Yee-Whye Teh · Qixuan Feng · Freddie Bickford Smith · Tim G. J. Rudner -
2021 : Invited Talk #1 »
Yarin Gal -
2021 : Live Panel Discussion »
Thomas Dietterich · Chelsea Finn · Kamalika Chaudhuri · Yarin Gal · Uri Shalit -
2021 Poster: Active Testing: Sample-Efficient Model Evaluation »
Jannik Kossen · Sebastian Farquhar · Yarin Gal · Tom Rainforth -
2021 Poster: Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design »
Adam Foster · Desi Ivanova · ILYAS MALIK · Tom Rainforth -
2021 Spotlight: Active Testing: Sample-Efficient Model Evaluation »
Jannik Kossen · Sebastian Farquhar · Yarin Gal · Tom Rainforth -
2021 Oral: Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design »
Adam Foster · Desi Ivanova · ILYAS MALIK · Tom Rainforth -
2021 Poster: Probabilistic Programs with Stochastic Conditioning »
David Tolpin · Yuan Zhou · Tom Rainforth · Hongseok Yang -
2021 Poster: Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding »
Andrew Jesson · Sören Mindermann · Yarin Gal · Uri Shalit -
2021 Spotlight: Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding »
Andrew Jesson · Sören Mindermann · Yarin Gal · Uri Shalit -
2021 Spotlight: Probabilistic Programs with Stochastic Conditioning »
David Tolpin · Yuan Zhou · Tom Rainforth · Hongseok Yang -
2021 Poster: PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning »
Angelos Filos · Clare Lyle · Yarin Gal · Sergey Levine · Natasha Jaques · Gregory Farquhar -
2021 Oral: PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning »
Angelos Filos · Clare Lyle · Yarin Gal · Sergey Levine · Natasha Jaques · Gregory Farquhar -
2020 : "Designing Bayesian-Optimal Experiments with Stochastic Gradients" »
Tom Rainforth -
2020 Poster: Inter-domain Deep Gaussian Processes »
Tim G. J. Rudner · Dino Sejdinovic · Yarin Gal -
2020 Poster: Divide, Conquer, and Combine: a New Inference Strategy for Probabilistic Programs with Stochastic Support »
Yuan Zhou · Hongseok Yang · Yee-Whye Teh · Tom Rainforth -
2020 Poster: Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts? »
Angelos Filos · Panagiotis Tigas · Rowan McAllister · Nicholas Rhinehart · Sergey Levine · Yarin Gal -
2020 Poster: Invariant Causal Prediction for Block MDPs »
Amy Zhang · Clare Lyle · Shagun Sodhani · Angelos Filos · Marta Kwiatkowska · Joelle Pineau · Yarin Gal · Doina Precup -
2020 Poster: Uncertainty Estimation Using a Single Deep Deterministic Neural Network »
Joost van Amersfoort · Lewis Smith · Yee-Whye Teh · Yarin Gal -
2019 Poster: Disentangling Disentanglement in Variational Autoencoders »
Emile Mathieu · Tom Rainforth · N Siddharth · Yee-Whye Teh -
2019 Oral: Disentangling Disentanglement in Variational Autoencoders »
Emile Mathieu · Tom Rainforth · N Siddharth · Yee-Whye Teh -
2019 Poster: Amortized Monte Carlo Integration »
Adam Golinski · Frank Wood · Tom Rainforth -
2019 Oral: Amortized Monte Carlo Integration »
Adam Golinski · Frank Wood · Tom Rainforth -
2018 Poster: On Nesting Monte Carlo Estimators »
Tom Rainforth · Rob Cornish · Hongseok Yang · andrew warrington · Frank Wood -
2018 Oral: On Nesting Monte Carlo Estimators »
Tom Rainforth · Rob Cornish · Hongseok Yang · andrew warrington · Frank Wood -
2018 Poster: Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam »
Mohammad Emtiyaz Khan · Didrik Nielsen · Voot Tangkaratt · Wu Lin · Yarin Gal · Akash Srivastava -
2018 Oral: Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam »
Mohammad Emtiyaz Khan · Didrik Nielsen · Voot Tangkaratt · Wu Lin · Yarin Gal · Akash Srivastava -
2018 Poster: Tighter Variational Bounds are Not Necessarily Better »
Tom Rainforth · Adam Kosiorek · Tuan Anh Le · Chris Maddison · Maximilian Igl · Frank Wood · Yee-Whye Teh -
2018 Oral: Tighter Variational Bounds are Not Necessarily Better »
Tom Rainforth · Adam Kosiorek · Tuan Anh Le · Chris Maddison · Maximilian Igl · Frank Wood · Yee-Whye Teh