Timezone: »
The gradient noise of SGD is considered to play a central role in the observed strong generalization abilities of deep learning. While past studies confirm that the magnitude and the covariance structure of gradient noise are critical for regularization, it remains unclear whether or not the class of noise distributions is important. In this work we provide negative results by showing that noises in classes different from the SGD noise can also effectively regularize gradient descent. Our finding is based on a novel observation on the structure of the SGD noise: it is the multiplication of the gradient matrix and a sampling noise that arises from the mini-batch sampling procedure. Moreover, the sampling noises unify two kinds of gradient regularizing noises that belong to the Gaussian class: the one using (scaled) Fisher as covariance and the one using the gradient covariance of SGD as covariance. Finally, thanks to the flexibility of choosing noise class, an algorithm is proposed to perform noisy gradient descent that generalizes well, the variant of which even benefits large batch SGD training without hurting generalization.
Author Information
Jingfeng Wu (Johns Hopkins University)
Wenqing Hu (Missouri S&T)
Haoyi Xiong (Baidu Research)
Jun Huan (Styling AI)
Vladimir Braverman (Johns Hopkins University)
Zhanxing Zhu (Peking University)
More from the Same Authors
-
2021 : Adversarial Robustness of Streaming Algorithms through Importance Sampling »
Vladimir Braverman · Avinatan Hasidim · Yossi Matias · Mariano Schain · Sandeep Silwal · Samson Zhou -
2021 : Bi-directional Adaptive Communication for Heterogenous Distributed Learning »
Dmitrii Avdiukhin · Vladimir Braverman -
2021 : Gap-Dependent Unsupervised Exploration for Reinforcement Learning »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2022 : The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2023 Poster: Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron »
Jingfeng Wu · Difan Zou · Zixiang Chen · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2023 Poster: Provable Data Subset Selection For Efficient Neural Networks Training »
Morad Tukan · Samson Zhou · Alaa Maalouf · Daniela Rus · Vladimir Braverman · Dan Feldman -
2023 Poster: MonoFlow: Rethinking Divergence GANs via the Perspective of Wasserstein Gradient Flows »
Mingxuan Yi · Zhanxing Zhu · Song Liu -
2023 Poster: AutoCoreset: An Automatic Practical Coreset Construction Framework »
Alaa Maalouf · Morad Tukan · Vladimir Braverman · Daniela Rus -
2022 Poster: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Oral: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2021 Workshop: ICML Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI »
Quanshi Zhang · Tian Han · Lixin Fan · Zhanxing Zhu · Hang Su · Ying Nian Wu -
2021 Poster: Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization »
Zeke Xie · Li Yuan · Zhanxing Zhu · Masashi Sugiyama -
2021 Spotlight: Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization »
Zeke Xie · Li Yuan · Zhanxing Zhu · Masashi Sugiyama -
2020 Poster: On Breaking Deep Generative Model-based Defenses and Beyond »
Yanzhi Chen · Renjie Xie · Zhanxing Zhu -
2020 Poster: Coresets for Clustering in Graphs of Bounded Treewidth »
Daniel Baker · Vladimir Braverman · Lingxiao Huang · Shaofeng H.-C. Jiang · Robert Krauthgamer · Xuan Wu -
2020 Poster: Schatten Norms in Matrix Streams: Hello Sparsity, Goodbye Dimension »
Vladimir Braverman · Robert Krauthgamer · Aditya Krishnan · Roi Sinoff -
2020 Poster: Obtaining Adjustable Regularization for Free via Iterate Averaging »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2020 Poster: Informative Dropout for Robust Representation Learning: A Shape-bias Perspective »
Baifeng Shi · Dinghuai Zhang · Qi Dai · Zhanxing Zhu · Yadong Mu · Jingdong Wang -
2020 Poster: FetchSGD: Communication-Efficient Federated Learning with Sketching »
Daniel Rothchild · Ashwinee Panda · Enayat Ullah · Nikita Ivkin · Ion Stoica · Vladimir Braverman · Joseph E Gonzalez · Raman Arora -
2020 Expo Talk Panel: Baidu AutoDL: Automated and Interpretable Deep Learning »
Bolei Zhou · Yi Yang · Quanshi Zhang · Dejing Dou · Haoyi Xiong · Jiahui Yu · Humphrey Shi · Linchao Zhu · Xingjian Li -
2019 Poster: Coresets for Ordered Weighted Clustering »
Vladimir Braverman · Shaofeng Jiang · Robert Krauthgamer · Xuan Wu -
2019 Oral: Coresets for Ordered Weighted Clustering »
Vladimir Braverman · Shaofeng Jiang · Robert Krauthgamer · Xuan Wu -
2019 Poster: Interpreting Adversarially Trained Convolutional Neural Networks »
Tianyuan Zhang · Zhanxing Zhu -
2019 Poster: The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects »
Zhanxing Zhu · Jingfeng Wu · Bing Yu · Lei Wu · Jinwen Ma -
2019 Oral: Interpreting Adversarially Trained Convolutional Neural Networks »
Tianyuan Zhang · Zhanxing Zhu -
2019 Oral: The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects »
Zhanxing Zhu · Jingfeng Wu · Bing Yu · Lei Wu · Jinwen Ma -
2018 Poster: Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order »
Vladimir Braverman · Stephen Chestnut · Robert Krauthgamer · Yi Li · David Woodruff · Lin Yang -
2018 Oral: Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order »
Vladimir Braverman · Stephen Chestnut · Robert Krauthgamer · Yi Li · David Woodruff · Lin Yang -
2017 Poster: Clustering High Dimensional Dynamic Data Streams »
Lin Yang · Harry Lang · Christian Sohler · Vladimir Braverman · Gereon Frahling -
2017 Talk: Clustering High Dimensional Dynamic Data Streams »
Lin Yang · Harry Lang · Christian Sohler · Vladimir Braverman · Gereon Frahling