Timezone: »
While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored. As a consequence, sampling is simply infeasible in many large-scale scenarios, despite providing remarkable benefits to generalization and uncertainty estimation for neural networks. In this paper, we provide the first study of low-precision Stochastic Gradient Langevin Dynamics (SGLD), showing that its costs can be significantly reduced without sacrificing performance, due to its intrinsic ability to handle system noise. We prove that the convergence of low-precision SGLD with full-precision gradient accumulators is less affected by the quantization error than its SGD counterpart in the strongly convex setting. To further enable low-precision gradient accumulators, we develop a new quantization function for SGLD that preserves the variance in each update step. We demonstrate that low-precision SGLD achieves comparable performance to full-precision SGLD with only 8 bits on a variety of deep learning tasks.
Author Information
Ruqi Zhang (UT Austin/Purdue)
Andrew Wilson (New York University)

Andrew Gordon Wilson is faculty in the Courant Institute and Center for Data Science at NYU. His interests include probabilistic modelling, Gaussian processes, Bayesian statistics, physics inspired machine learning, and loss surfaces and generalization in deep learning. His webpage is https://cims.nyu.edu/~andrewgw.
Christopher De Sa (Cornell)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Low-Precision Stochastic Gradient Langevin Dynamics »
Tue. Jul 19th through Wed the 20th Room Hall E #705
More from the Same Authors
-
2022 : How much Data is Augmentation Worth? »
Jonas Geiping · Gowthami Somepalli · Ravid Shwartz-Ziv · Andrew Wilson · Tom Goldstein · Micah Goldblum -
2022 : Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations »
Polina Kirichenko · Polina Kirichenko · Pavel Izmailov · Andrew Wilson -
2022 : On Feature Learning in the Presence of Spurious Correlations »
Pavel Izmailov · Polina Kirichenko · Nate Gruver · Andrew Wilson -
2022 : Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Prior »
Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson -
2023 : Understanding the Detrimental Class-level Effects of Data Augmentation »
Polina Kirichenko · Mark Ibrahim · Randall Balestriero · Diane Bouchacourt · Ramakrishna Vedantam · Hamed Firooz · Andrew Wilson -
2023 : Protein Design with Guided Discrete Diffusion »
Nate Gruver · Samuel Stanton · Nathan Frey · Tim G. J. Rudner · Isidro Hotzel · Julien Lafrance-Vanasse · Arvind Rajpal · Kyunghyun Cho · Andrew Wilson -
2023 Poster: Simple and Fast Group Robustness by Automatic Feature Reweighting »
Shikai Qiu · Andres Potapczynski · Pavel Izmailov · Andrew Wilson -
2023 Poster: User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems »
Marc Finzi · Anudhyan Boral · Andrew Wilson · Fei Sha · Leonardo Zepeda-Nunez -
2023 Poster: Function-Space Regularization in Neural Networks: A Probabilistic Perspective »
Tim G. J. Rudner · Sanyam Kapoor · Shikai Qiu · Andrew Wilson -
2022 : MCTensor: A High-Precision Deep Learning Library with Multi-Component Floating-Point »
Tao Yu · Wentao Guo · Canal Li · Tiancheng Yuan · Christopher De Sa -
2022 : Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Prior »
Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson -
2022 : Riemannian Residual Neural Networks »
Isay Katsman · Eric Chen · Sidhanth Holalkere · Aaron Lou · Ser Nam Lim · Christopher De Sa -
2022 Poster: Bayesian Model Selection, the Marginal Likelihood, and Generalization »
Sanae Lotfi · Pavel Izmailov · Gregory Benton · Micah Goldblum · Andrew Wilson -
2022 Oral: Bayesian Model Selection, the Marginal Likelihood, and Generalization »
Sanae Lotfi · Pavel Izmailov · Gregory Benton · Micah Goldblum · Andrew Wilson -
2022 Spotlight: Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders »
Samuel Stanton · Wesley Maddox · Nate Gruver · Phillip Maffettone · Emily Delaney · Peyton Greenside · Andrew Wilson -
2022 Poster: Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes »
Gregory Benton · Wesley Maddox · Andrew Wilson -
2022 Poster: A Langevin-like Sampler for Discrete Distributions »
Ruqi Zhang · Xingchao Liu · Qiang Liu -
2022 Poster: Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders »
Samuel Stanton · Wesley Maddox · Nate Gruver · Phillip Maffettone · Emily Delaney · Peyton Greenside · Andrew Wilson -
2022 Spotlight: A Langevin-like Sampler for Discrete Distributions »
Ruqi Zhang · Xingchao Liu · Qiang Liu -
2022 Spotlight: Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes »
Gregory Benton · Wesley Maddox · Andrew Wilson -
2021 Poster: SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes »
Sanyam Kapoor · Marc Finzi · Ke Alexander Wang · Andrew Wilson -
2021 Oral: SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes »
Sanyam Kapoor · Marc Finzi · Ke Alexander Wang · Andrew Wilson -
2021 Poster: Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition »
Shengyang Sun · Jiaxin Shi · Andrew Wilson · Roger Grosse -
2021 Spotlight: Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition »
Shengyang Sun · Jiaxin Shi · Andrew Wilson · Roger Grosse -
2021 Poster: Variance Reduced Training with Stratified Sampling for Forecasting Models »
Yucheng Lu · Youngsuk Park · Lifan Chen · Yuyang Wang · Christopher De Sa · Dean Foster -
2021 Spotlight: Variance Reduced Training with Stratified Sampling for Forecasting Models »
Yucheng Lu · Youngsuk Park · Lifan Chen · Yuyang Wang · Christopher De Sa · Dean Foster -
2021 Poster: A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups »
Marc Finzi · Max Welling · Andrew Wilson -
2021 Poster: Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision »
Johan Björck · Xiangyu Chen · Christopher De Sa · Carla Gomes · Kilian Weinberger -
2021 Spotlight: Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision »
Johan Björck · Xiangyu Chen · Christopher De Sa · Carla Gomes · Kilian Weinberger -
2021 Oral: A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups »
Marc Finzi · Max Welling · Andrew Wilson -
2021 Poster: Optimal Complexity in Decentralized Training »
Yucheng Lu · Christopher De Sa -
2021 Poster: What Are Bayesian Neural Network Posteriors Really Like? »
Pavel Izmailov · Sharad Vikram · Matthew Hoffman · Andrew Wilson -
2021 Poster: Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling »
Gregory Benton · Wesley Maddox · Sanae Lotfi · Andrew Wilson -
2021 Spotlight: Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling »
Gregory Benton · Wesley Maddox · Sanae Lotfi · Andrew Wilson -
2021 Oral: What Are Bayesian Neural Network Posteriors Really Like? »
Pavel Izmailov · Sharad Vikram · Matthew Hoffman · Andrew Wilson -
2021 Oral: Optimal Complexity in Decentralized Training »
Yucheng Lu · Christopher De Sa -
2020 Poster: Semi-Supervised Learning with Normalizing Flows »
Pavel Izmailov · Polina Kirichenko · Marc Finzi · Andrew Wilson -
2020 Poster: Randomly Projected Additive Gaussian Processes for Regression »
Ian Delbridge · David S Bindel · Andrew Wilson -
2020 Poster: Moniqua: Modulo Quantized Communication in Decentralized SGD »
Yucheng Lu · Christopher De Sa -
2020 Poster: Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data »
Marc Finzi · Samuel Stanton · Pavel Izmailov · Andrew Wilson -
2020 Poster: Differentiating through the Fréchet Mean »
Aaron Lou · Isay Katsman · Qingxuan Jiang · Serge Belongie · Ser Nam Lim · Christopher De Sa -
2020 Tutorial: Bayesian Deep Learning and a Probabilistic Perspective of Model Construction »
Andrew Wilson -
2019 : poster session I »
Nicholas Rhinehart · Yunhao Tang · Vinay Prabhu · Dian Ang Yap · Alexander Wang · Marc Finzi · Manoj Kumar · You Lu · Abhishek Kumar · Qi Lei · Michael Przystupa · Nicola De Cao · Polina Kirichenko · Pavel Izmailov · Andrew Wilson · Jakob Kruse · Diego Mesquita · Mario Lezcano Casado · Thomas Müller · Keir Simmons · Andrei Atanov -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 : Subspace Inference for Bayesian Deep Learning »
Polina Kirichenko · Pavel Izmailov · Andrew Wilson -
2019 Poster: Simple Black-box Adversarial Attacks »
Chuan Guo · Jacob Gardner · Yurong You · Andrew Wilson · Kilian Weinberger -
2019 Poster: Distributed Learning with Sublinear Communication »
Jayadev Acharya · Christopher De Sa · Dylan Foster · Karthik Sridharan -
2019 Oral: Simple Black-box Adversarial Attacks »
Chuan Guo · Jacob Gardner · Yurong You · Andrew Wilson · Kilian Weinberger -
2019 Oral: Distributed Learning with Sublinear Communication »
Jayadev Acharya · Christopher De Sa · Dylan Foster · Karthik Sridharan -
2019 Poster: SWALP : Stochastic Weight Averaging in Low Precision Training »
Guandao Yang · Tianyi Zhang · Polina Kirichenko · Junwen Bai · Andrew Wilson · Christopher De Sa -
2019 Poster: A Kernel Theory of Modern Data Augmentation »
Tri Dao · Albert Gu · Alexander J Ratner · Virginia Smith · Christopher De Sa · Christopher Re -
2019 Poster: Improving Neural Network Quantization without Retraining using Outlier Channel Splitting »
Ritchie Zhao · Yuwei Hu · Jordan Dotzel · Christopher De Sa · Zhiru Zhang -
2019 Oral: SWALP : Stochastic Weight Averaging in Low Precision Training »
Guandao Yang · Tianyi Zhang · Polina Kirichenko · Junwen Bai · Andrew Wilson · Christopher De Sa -
2019 Oral: Improving Neural Network Quantization without Retraining using Outlier Channel Splitting »
Ritchie Zhao · Yuwei Hu · Jordan Dotzel · Christopher De Sa · Zhiru Zhang -
2019 Oral: A Kernel Theory of Modern Data Augmentation »
Tri Dao · Albert Gu · Alexander J Ratner · Virginia Smith · Christopher De Sa · Christopher Re -
2018 Poster: Constant-Time Predictive Distributions for Gaussian Processes »
Geoff Pleiss · Jacob Gardner · Kilian Weinberger · Andrew Wilson -
2018 Poster: Minibatch Gibbs Sampling on Large Graphical Models »
Christopher De Sa · Vincent Chen · Wong -
2018 Oral: Minibatch Gibbs Sampling on Large Graphical Models »
Christopher De Sa · Vincent Chen · Wong -
2018 Oral: Constant-Time Predictive Distributions for Gaussian Processes »
Geoff Pleiss · Jacob Gardner · Kilian Weinberger · Andrew Wilson -
2018 Poster: Representation Tradeoffs for Hyperbolic Embeddings »
Frederic Sala · Christopher De Sa · Albert Gu · Christopher Re -
2018 Oral: Representation Tradeoffs for Hyperbolic Embeddings »
Frederic Sala · Christopher De Sa · Albert Gu · Christopher Re