Timezone: »
With a better understanding of the loss surfaces for multilayer networks, we can build more robust and accurate training procedures. Recently it was discovered that independently trained SGD solutions can be connected along one-dimensional paths of near-constant training loss. In this paper, we in fact demonstrate the existence of mode-connecting simplicial complexes that form multi-dimensional manifolds of low loss, connecting many independently trained models. Building on this discovery, we show how to efficiently construct simplicial complexes for fast ensembling, outperforming independently trained deep ensembles in accuracy, calibration, and robustness to dataset shift. Notably, our approach is easy to apply and only requires a few training epochs to discover a low-loss simplex.
Author Information
Gregory Benton (New York University)
Wesley Maddox (New York University)
Sanae Lotfi (New York University)

I am a PhD student at the Center for Data Science at NYU and a DeepMind fellow, advised by Professor Andrew Gordon Wilson. I am currently interested in designing robust models that can generalize well in and out of distribution. I also work on the closely related question of understanding and quantifying the generalization properties of deep neural networks. More broadly, my research interests include out-of-distribution generalization, Bayesian learning, probabilistic modeling, large-scale optimization, and loss surface analysis. Prior to NYU, I obtained a master’s degree in applied mathematics from Polytechnique Montreal. I was fortunate to work there with Professors Andrea Lodi and Dominique Orban to design stochastic first- and second-order algorithms with compelling theoretical and empirical properties for machine learning and large-scale optimization. I received the Best Master’s Thesis Award in Applied Mathematics at Polytechnique Montreal for this work. I also hold an engineering degree in general engineering and applied mathematics from CentraleSupélec.
Andrew Wilson (New York University)

Andrew Gordon Wilson is faculty in the Courant Institute and Center for Data Science at NYU. His interests include probabilistic modelling, Gaussian processes, Bayesian statistics, physics inspired machine learning, and loss surfaces and generalization in deep learning. His webpage is https://cims.nyu.edu/~andrewgw.
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling »
Tue. Jul 20th 04:00 -- 06:00 PM Room
More from the Same Authors
-
2022 : How much Data is Augmentation Worth? »
Jonas Geiping · Gowthami Somepalli · Ravid Shwartz-Ziv · Andrew Wilson · Tom Goldstein · Micah Goldblum -
2022 : Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations »
Polina Kirichenko · Polina Kirichenko · Pavel Izmailov · Andrew Wilson -
2022 : On Feature Learning in the Presence of Spurious Correlations »
Pavel Izmailov · Polina Kirichenko · Nate Gruver · Andrew Wilson -
2022 : Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Prior »
Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson -
2023 Poster: Simple and Fast Group Robustness by Automatic Feature Reweighting »
Shikai Qiu · Andres Potapczynski · Pavel Izmailov · Andrew Wilson -
2023 Poster: User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems »
Marc Finzi · Anudhyan Boral · Leonardo Zepeda-Nunez · Andrew Wilson · Fei Sha -
2023 Poster: Function-Space Regularization in Neural Networks: A Probabilistic Perspective »
Tim G. J. Rudner · Sanyam Kapoor · Shikai Qiu · Andrew Wilson -
2022 : Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Prior »
Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson -
2022 Poster: Bayesian Model Selection, the Marginal Likelihood, and Generalization »
Sanae Lotfi · Pavel Izmailov · Gregory Benton · Micah Goldblum · Andrew Wilson -
2022 Oral: Bayesian Model Selection, the Marginal Likelihood, and Generalization »
Sanae Lotfi · Pavel Izmailov · Gregory Benton · Micah Goldblum · Andrew Wilson -
2022 Spotlight: Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders »
Samuel Stanton · Wesley Maddox · Nate Gruver · Phillip Maffettone · Emily Delaney · Peyton Greenside · Andrew Wilson -
2022 Poster: Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes »
Gregory Benton · Wesley Maddox · Andrew Wilson -
2022 Poster: Low-Precision Stochastic Gradient Langevin Dynamics »
Ruqi Zhang · Andrew Wilson · Christopher De Sa -
2022 Poster: Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders »
Samuel Stanton · Wesley Maddox · Nate Gruver · Phillip Maffettone · Emily Delaney · Peyton Greenside · Andrew Wilson -
2022 Spotlight: Low-Precision Stochastic Gradient Langevin Dynamics »
Ruqi Zhang · Andrew Wilson · Christopher De Sa -
2022 Spotlight: Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes »
Gregory Benton · Wesley Maddox · Andrew Wilson -
2021 Poster: SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes »
Sanyam Kapoor · Marc Finzi · Ke Alexander Wang · Andrew Wilson -
2021 Oral: SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes »
Sanyam Kapoor · Marc Finzi · Ke Alexander Wang · Andrew Wilson -
2021 Poster: Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition »
Shengyang Sun · Jiaxin Shi · Andrew Wilson · Roger Grosse -
2021 Spotlight: Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition »
Shengyang Sun · Jiaxin Shi · Andrew Wilson · Roger Grosse -
2021 Poster: A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups »
Marc Finzi · Max Welling · Andrew Wilson -
2021 Oral: A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups »
Marc Finzi · Max Welling · Andrew Wilson -
2021 Poster: What Are Bayesian Neural Network Posteriors Really Like? »
Pavel Izmailov · Sharad Vikram · Matthew Hoffman · Andrew Wilson -
2021 Oral: What Are Bayesian Neural Network Posteriors Really Like? »
Pavel Izmailov · Sharad Vikram · Matthew Hoffman · Andrew Wilson -
2020 Poster: Semi-Supervised Learning with Normalizing Flows »
Pavel Izmailov · Polina Kirichenko · Marc Finzi · Andrew Wilson -
2020 Poster: Randomly Projected Additive Gaussian Processes for Regression »
Ian Delbridge · David S Bindel · Andrew Wilson -
2020 Poster: Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data »
Marc Finzi · Samuel Stanton · Pavel Izmailov · Andrew Wilson -
2020 Tutorial: Bayesian Deep Learning and a Probabilistic Perspective of Model Construction »
Andrew Wilson -
2019 : poster session I »
Nicholas Rhinehart · Yunhao Tang · Vinay Prabhu · Dian Ang Yap · Alexander Wang · Marc Finzi · Manoj Kumar · You Lu · Abhishek Kumar · Qi Lei · Michael Przystupa · Nicola De Cao · Polina Kirichenko · Pavel Izmailov · Andrew Wilson · Jakob Kruse · Diego Mesquita · Mario Lezcano Casado · Thomas Müller · Keir Simmons · Andrei Atanov -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 : Subspace Inference for Bayesian Deep Learning »
Polina Kirichenko · Pavel Izmailov · Andrew Wilson -
2019 Poster: Simple Black-box Adversarial Attacks »
Chuan Guo · Jacob Gardner · Yurong You · Andrew Wilson · Kilian Weinberger -
2019 Oral: Simple Black-box Adversarial Attacks »
Chuan Guo · Jacob Gardner · Yurong You · Andrew Wilson · Kilian Weinberger -
2019 Poster: SWALP : Stochastic Weight Averaging in Low Precision Training »
Guandao Yang · Tianyi Zhang · Polina Kirichenko · Junwen Bai · Andrew Wilson · Christopher De Sa -
2019 Oral: SWALP : Stochastic Weight Averaging in Low Precision Training »
Guandao Yang · Tianyi Zhang · Polina Kirichenko · Junwen Bai · Andrew Wilson · Christopher De Sa -
2018 Poster: Constant-Time Predictive Distributions for Gaussian Processes »
Geoff Pleiss · Jacob Gardner · Kilian Weinberger · Andrew Wilson -
2018 Oral: Constant-Time Predictive Distributions for Gaussian Processes »
Geoff Pleiss · Jacob Gardner · Kilian Weinberger · Andrew Wilson