Timezone: »
The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex. For computational reasons, researchers approximate this posterior using inexpensive mini-batch methods such as mean-field variational inference or stochastic-gradient Markov chain Monte Carlo (SGMCMC). To investigate foundational questions in Bayesian deep learning, we instead use full batch Hamiltonian Monte Carlo (HMC) on modern architectures. We show that (1) BNNs can achieve significant performance gains over standard training and deep ensembles; (2) a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains; (3) in contrast to recent studies, we find posterior tempering is not needed for near-optimal performance, with little evidence for a ``cold posterior'' effect, which we show is largely an artifact of data augmentation; (4) BMA performance is robust to the choice of prior scale, and relatively similar for diagonal Gaussian, mixture of Gaussian, and logistic priors; (5) Bayesian neural networks show surprisingly poor generalization under domain shift; (6) while cheaper alternatives such as deep ensembles and SGMCMC can provide good generalization, their predictive distributions are distinct from HMC. Notably, deep ensemble predictive distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
Author Information
Pavel Izmailov (New York University)
Sharad Vikram (Google)
Matthew Hoffman (Google)
Andrew Wilson (New York University)

Andrew Gordon Wilson is faculty in the Courant Institute and Center for Data Science at NYU. His interests include probabilistic modelling, Gaussian processes, Bayesian statistics, physics inspired machine learning, and loss surfaces and generalization in deep learning. His webpage is https://cims.nyu.edu/~andrewgw.
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: What Are Bayesian Neural Network Posteriors Really Like? »
Tue. Jul 20th 04:00 -- 06:00 PM Room Virtual
More from the Same Authors
-
2022 : How much Data is Augmentation Worth? »
Jonas Geiping · Gowthami Somepalli · Ravid Shwartz-Ziv · Andrew Wilson · Tom Goldstein · Micah Goldblum -
2022 : Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations »
Polina Kirichenko · Polina Kirichenko · Pavel Izmailov · Andrew Wilson -
2022 : On Feature Learning in the Presence of Spurious Correlations »
Pavel Izmailov · Polina Kirichenko · Nate Gruver · Andrew Wilson -
2022 : Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Prior »
Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson -
2023 : Understanding the Detrimental Class-level Effects of Data Augmentation »
Polina Kirichenko · Mark Ibrahim · Randall Balestriero · Diane Bouchacourt · Ramakrishna Vedantam · Hamed Firooz · Andrew Wilson -
2023 : Protein Design with Guided Discrete Diffusion »
Nate Gruver · Samuel Stanton · Nathan Frey · Tim G. J. Rudner · Isidro Hotzel · Julien Lafrance-Vanasse · Arvind Rajpal · Kyunghyun Cho · Andrew Wilson -
2023 Poster: Underspecification Presents Challenges for Credibility in Modern Machine Learning »
Alexander D'Amour · Katherine Heller · Dan Moldovan · Ben Adlam · Babak Alipanahi · Alex Beutel · Christina Chen · Jonathan Deaton · Jacob Eisenstein · Matthew Hoffman · Farhad Hormozdiari · Neil Houlsby · Shaobo Hou · Ghassen Jerfel · Alan Karthikesalingam · Mario Lucic · Yian Ma · Cory McLean · Diana Mincu · Akinori Mitani · Andrea Montanari · Zachary Nado · Vivek Natarajan · Christopher Nielson · Thomas F. Osborne · Rajiv Raman · Kim Ramasamy · Rory sayres · Jessica Schrouff · Martin Seneviratne · Shannon Sequeira · Harini Suresh · Victor Veitch · Maksym Vladymyrov · Xuezhi Wang · Kellie Webster · Steve Yadlowsky · Taedong Yun · Xiaohua Zhai · D. Sculley -
2023 Poster: Simple and Fast Group Robustness by Automatic Feature Reweighting »
Shikai Qiu · Andres Potapczynski · Pavel Izmailov · Andrew Wilson -
2023 Poster: User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems »
Marc Finzi · Anudhyan Boral · Andrew Wilson · Fei Sha · Leonardo Zepeda-Nunez -
2023 Poster: Function-Space Regularization in Neural Networks: A Probabilistic Perspective »
Tim G. J. Rudner · Sanyam Kapoor · Shikai Qiu · Andrew Wilson -
2022 : Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Prior »
Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson -
2022 Poster: Bayesian Model Selection, the Marginal Likelihood, and Generalization »
Sanae Lotfi · Pavel Izmailov · Gregory Benton · Micah Goldblum · Andrew Wilson -
2022 Oral: Bayesian Model Selection, the Marginal Likelihood, and Generalization »
Sanae Lotfi · Pavel Izmailov · Gregory Benton · Micah Goldblum · Andrew Wilson -
2022 Spotlight: Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders »
Samuel Stanton · Wesley Maddox · Nate Gruver · Phillip Maffettone · Emily Delaney · Peyton Greenside · Andrew Wilson -
2022 Poster: Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes »
Gregory Benton · Wesley Maddox · Andrew Wilson -
2022 Poster: Low-Precision Stochastic Gradient Langevin Dynamics »
Ruqi Zhang · Andrew Wilson · Christopher De Sa -
2022 Poster: Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders »
Samuel Stanton · Wesley Maddox · Nate Gruver · Phillip Maffettone · Emily Delaney · Peyton Greenside · Andrew Wilson -
2022 Spotlight: Low-Precision Stochastic Gradient Langevin Dynamics »
Ruqi Zhang · Andrew Wilson · Christopher De Sa -
2022 Spotlight: Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes »
Gregory Benton · Wesley Maddox · Andrew Wilson -
2021 Poster: SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes »
Sanyam Kapoor · Marc Finzi · Ke Alexander Wang · Andrew Wilson -
2021 Oral: SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes »
Sanyam Kapoor · Marc Finzi · Ke Alexander Wang · Andrew Wilson -
2021 Poster: Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition »
Shengyang Sun · Jiaxin Shi · Andrew Wilson · Roger Grosse -
2021 Spotlight: Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition »
Shengyang Sun · Jiaxin Shi · Andrew Wilson · Roger Grosse -
2021 Poster: A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups »
Marc Finzi · Max Welling · Andrew Wilson -
2021 Oral: A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups »
Marc Finzi · Max Welling · Andrew Wilson -
2021 Poster: Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling »
Gregory Benton · Wesley Maddox · Sanae Lotfi · Andrew Wilson -
2021 Spotlight: Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling »
Gregory Benton · Wesley Maddox · Sanae Lotfi · Andrew Wilson -
2020 Poster: Semi-Supervised Learning with Normalizing Flows »
Pavel Izmailov · Polina Kirichenko · Marc Finzi · Andrew Wilson -
2020 Poster: Black-Box Variational Inference as a Parametric Approximation to Langevin Dynamics »
Matthew Hoffman · Yian Ma -
2020 Poster: Automatic Reparameterisation of Probabilistic Programs »
Maria Gorinova · Dave Moore · Matthew Hoffman -
2020 Poster: Randomly Projected Additive Gaussian Processes for Regression »
Ian Delbridge · David S Bindel · Andrew Wilson -
2020 Poster: Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data »
Marc Finzi · Samuel Stanton · Pavel Izmailov · Andrew Wilson -
2020 Tutorial: Bayesian Deep Learning and a Probabilistic Perspective of Model Construction »
Andrew Wilson -
2019 : poster session I »
Nicholas Rhinehart · Yunhao Tang · Vinay Prabhu · Dian Ang Yap · Alexander Wang · Marc Finzi · Manoj Kumar · You Lu · Abhishek Kumar · Qi Lei · Michael Przystupa · Nicola De Cao · Polina Kirichenko · Pavel Izmailov · Andrew Wilson · Jakob Kruse · Diego Mesquita · Mario Lezcano Casado · Thomas Müller · Keir Simmons · Andrei Atanov -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 : Subspace Inference for Bayesian Deep Learning »
Polina Kirichenko · Pavel Izmailov · Andrew Wilson -
2019 : Networking Lunch (provided) + Poster Session »
Abraham Stanway · Alex Robson · Aneesh Rangnekar · Ashesh Chattopadhyay · Ashley Pilipiszyn · Benjamin LeRoy · Bolong Cheng · Ce Zhang · Chaopeng Shen · Christian Schroeder · Christian Clough · Clement DUHART · Clement Fung · Cozmin Ududec · Dali Wang · David Dao · di wu · Dimitrios Giannakis · Dino Sejdinovic · Doina Precup · Duncan Watson-Parris · Gege Wen · George Chen · Gopal Erinjippurath · Haifeng Li · Han Zou · Herke van Hoof · Hillary A Scannell · Hiroshi Mamitsuka · Hongbao Zhang · Jaegul Choo · James Wang · James Requeima · Jessica Hwang · Jinfan Xu · Johan Mathe · Jonathan Binas · Joonseok Lee · Kalai Ramea · Kate Duffy · Kevin McCloskey · Kris Sankaran · Lester Mackey · Letif Mones · Loubna Benabbou · Lynn Kaack · Matthew Hoffman · Mayur Mudigonda · Mehrdad Mahdavi · Michael McCourt · Mingchao Jiang · Mohammad Mahdi Kamani · Neel Guha · Niccolo Dalmasso · Nick Pawlowski · Nikola Milojevic-Dupont · Paulo Orenstein · Pedram Hassanzadeh · Pekka Marttinen · Ramesh Nair · Sadegh Farhang · Samuel Kaski · Sandeep Manjanna · Sasha Luccioni · Shuby Deshpande · Soo Kim · Soukayna Mouatadid · Sunghyun Park · Tao Lin · Telmo Felgueira · Thomas Hornigold · Tianle Yuan · Tom Beucler · Tracy Cui · Volodymyr Kuleshov · Wei Yu · yang song · Ydo Wexler · Yoshua Bengio · Zhecheng Wang · Zhuangfang Yi · Zouheir Malki -
2019 Poster: Simple Black-box Adversarial Attacks »
Chuan Guo · Jacob Gardner · Yurong You · Andrew Wilson · Kilian Weinberger -
2019 Oral: Simple Black-box Adversarial Attacks »
Chuan Guo · Jacob Gardner · Yurong You · Andrew Wilson · Kilian Weinberger -
2019 Poster: SWALP : Stochastic Weight Averaging in Low Precision Training »
Guandao Yang · Tianyi Zhang · Polina Kirichenko · Junwen Bai · Andrew Wilson · Christopher De Sa -
2019 Oral: SWALP : Stochastic Weight Averaging in Low Precision Training »
Guandao Yang · Tianyi Zhang · Polina Kirichenko · Junwen Bai · Andrew Wilson · Christopher De Sa -
2018 Poster: Constant-Time Predictive Distributions for Gaussian Processes »
Geoff Pleiss · Jacob Gardner · Kilian Weinberger · Andrew Wilson -
2018 Oral: Constant-Time Predictive Distributions for Gaussian Processes »
Geoff Pleiss · Jacob Gardner · Kilian Weinberger · Andrew Wilson -
2017 Poster: Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo »
Matthew Hoffman -
2017 Talk: Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo »
Matthew Hoffman