Timezone: »
Dropout, a~stochastic regularisation technique for training of neural networks, has recently been reinterpreted as a~specific type of approximate inference algorithm for Bayesian neural networks. The~main contribution of the~reinterpretation is in providing a~theoretical framework useful for analysing and extending the~algorithm. We show that the~proposed framework suffers from several issues; from undefined or pathological behaviour of the~true posterior related to use of improper priors, to an ill-defined variational objective due to singularity of the~approximating distribution relative to the~true posterior. Our analysis of the~improper log uniform prior used in variational Gaussian dropout suggests the~pathologies are generally irredeemable, and that the~algorithm still works only because the~variational formulation annuls some of the~pathologies. To address the~singularity issue, we proffer Quasi-KL (QKL) divergence, a~new approximate inference objective for approximation of high-dimensional distributions. We show that motivations for variational Bernoulli dropout based on discretisation and noise have QKL as a limit. Properties of QKL are studied both theoretically and on a~simple practical example which shows that the~QKL-optimal approximation of a~full rank Gaussian with a~degenerate one naturally leads to the~Principal Component Analysis solution.
Author Information
Jiri Hron (University of Cambridge)
Alexander Matthews (University of Cambridge)
Research Associate at University of Cambridge Interested in Deep Learning, Gaussian Processes, Scalable Approximate Inference, Privacy in Machine Learning
Zoubin Ghahramani (University of Cambridge & Uber)
Zoubin Ghahramani is a Professor at the University of Cambridge, and Chief Scientist at Uber. He is also Deputy Director of the Leverhulme Centre for the Future of Intelligence, was a founding Director of the Alan Turing Institute and co-founder of Geometric Intelligence (now Uber AI Labs). His research focuses on probabilistic approaches to machine learning and AI. In 2015 he was elected a Fellow of the Royal Society.
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: Variational Bayesian dropout: pitfalls and fixes »
Thu. Jul 12th 02:00 -- 02:20 PM Room A4
More from the Same Authors
-
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · Jie Ren · Joost van Amersfoort · Kehang Han · E. Kelly Buchanan · Kevin Murphy · Mark Collier · Mike Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2023 Poster: Neural Diffusion Processes »
Vincent Dutordoir · Alan Saul · Zoubin Ghahramani · Fergus Simpson -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2022 Poster: Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling »
Jiri Hron · Roman Novak · Jeffrey Pennington · Jascha Sohl-Dickstein -
2022 Spotlight: Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling »
Jiri Hron · Roman Novak · Jeffrey Pennington · Jascha Sohl-Dickstein -
2020 Poster: Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits »
Robert Peharz · Steven Lang · Antonio Vergari · Karl Stelzner · Alejandro Molina · Martin Trapp · Guy Van den Broeck · Kristian Kersting · Zoubin Ghahramani -
2020 Poster: Infinite attention: NNGP and NTK for deep attention networks »
Jiri Hron · Yasaman Bahri · Jascha Sohl-Dickstein · Roman Novak -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2018 Poster: The Mirage of Action-Dependent Baselines in Reinforcement Learning »
George Tucker · Surya Bhupatiraju · Shixiang Gu · Richard E Turner · Zoubin Ghahramani · Sergey Levine -
2018 Oral: The Mirage of Action-Dependent Baselines in Reinforcement Learning »
George Tucker · Surya Bhupatiraju · Shixiang Gu · Richard E Turner · Zoubin Ghahramani · Sergey Levine -
2018 Poster: Discovering Interpretable Representations for Both Deep Generative and Discriminative Models »
Tameem Adel · Zoubin Ghahramani · Adrian Weller -
2018 Oral: Discovering Interpretable Representations for Both Deep Generative and Discriminative Models »
Tameem Adel · Zoubin Ghahramani · Adrian Weller -
2017 Poster: Magnetic Hamiltonian Monte Carlo »
Nilesh Tripuraneni · Mark Rowland · Zoubin Ghahramani · Richard E Turner -
2017 Talk: Magnetic Hamiltonian Monte Carlo »
Nilesh Tripuraneni · Mark Rowland · Zoubin Ghahramani · Richard E Turner -
2017 Poster: Lost Relatives of the Gumbel Trick »
Matej Balog · Nilesh Tripuraneni · Zoubin Ghahramani · Adrian Weller -
2017 Poster: Bayesian inference on random simple graphs with power law degree distributions »
Juho Lee · Creighton Heaukulani · Zoubin Ghahramani · Lancelot F. James · Seungjin Choi -
2017 Talk: Lost Relatives of the Gumbel Trick »
Matej Balog · Nilesh Tripuraneni · Zoubin Ghahramani · Adrian Weller -
2017 Talk: Bayesian inference on random simple graphs with power law degree distributions »
Juho Lee · Creighton Heaukulani · Zoubin Ghahramani · Lancelot F. James · Seungjin Choi -
2017 Poster: Automatic Discovery of the Statistical Types of Variables in a Dataset »
Isabel Valera · Zoubin Ghahramani -
2017 Poster: A Birth-Death Process for Feature Allocation »
Konstantina Palla · David Knowles · Zoubin Ghahramani -
2017 Poster: Deep Bayesian Active Learning with Image Data »
Yarin Gal · Riashat Islam · Zoubin Ghahramani -
2017 Talk: A Birth-Death Process for Feature Allocation »
Konstantina Palla · David Knowles · Zoubin Ghahramani -
2017 Talk: Deep Bayesian Active Learning with Image Data »
Yarin Gal · Riashat Islam · Zoubin Ghahramani -
2017 Talk: Automatic Discovery of the Statistical Types of Variables in a Dataset »
Isabel Valera · Zoubin Ghahramani