Timezone: »
Understanding the behavior of stochastic gradient descent (SGD) in the context of deep neural networks has raised lots of concerns recently. Along this line, we study a general form of gradient based optimization dynamics with unbiased noise, which unifies SGD and standard Langevin dynamics. Through investigating this general optimization dynamics, we analyze the behavior of SGD on escaping from minima and its regularization effects. A novel indicator is derived to characterize the efficiency of escaping from minima through measuring the alignment of noise covariance and the curvature of loss function. Based on this indicator, two conditions are established to show which type of noise structure is superior to isotropic noise in term of escaping efficiency. We further show that the anisotropic noise in SGD satisfies the two conditions, and thus helps to escape from sharp and poor minima effectively, towards more stable and flat minima that typically generalize well. We systematically design various experiments to verify the benefits of the anisotropic noise, compared with full gradient descent plus isotropic diffusion (i.e. Langevin dynamics).
Author Information
Zhanxing Zhu (Peking University)
Jingfeng Wu (Johns Hopkins University)
Bing Yu (Peking University)
Lei Wu (Princeton University)
Jinwen Ma (Peking University)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects »
Wed. Jun 12th 07:00 -- 07:05 PM Room Room 104
More from the Same Authors
-
2023 Poster: MonoFlow: Rethinking Divergence GANs via the Perspective of Wasserstein Gradient Flows »
Mingxuan Yi · Zhanxing Zhu · Song Liu -
2022 Poster: PDO-s3DCNNs: Partial Differential Operator Based Steerable 3D CNNs »
Zhengyang Shen · Tao Hong · Qi She · Jinwen Ma · Zhouchen Lin -
2022 Spotlight: PDO-s3DCNNs: Partial Differential Operator Based Steerable 3D CNNs »
Zhengyang Shen · Tao Hong · Qi She · Jinwen Ma · Zhouchen Lin -
2021 Workshop: ICML Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI »
Quanshi Zhang · Tian Han · Lixin Fan · Zhanxing Zhu · Hang Su · Ying Nian Wu -
2021 Poster: Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization »
Zeke Xie · Li Yuan · Zhanxing Zhu · Masashi Sugiyama -
2021 Spotlight: Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization »
Zeke Xie · Li Yuan · Zhanxing Zhu · Masashi Sugiyama -
2020 Poster: On Breaking Deep Generative Model-based Defenses and Beyond »
Yanzhi Chen · Renjie Xie · Zhanxing Zhu -
2020 Poster: PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions »
Zhengyang Shen · Lingshen He · Zhouchen Lin · Jinwen Ma -
2020 Poster: Informative Dropout for Robust Representation Learning: A Shape-bias Perspective »
Baifeng Shi · Dinghuai Zhang · Qi Dai · Zhanxing Zhu · Yadong Mu · Jingdong Wang -
2020 Poster: On the Noisy Gradient Descent that Generalizes as SGD »
Jingfeng Wu · Wenqing Hu · Haoyi Xiong · Jun Huan · Vladimir Braverman · Zhanxing Zhu -
2019 Poster: Interpreting Adversarially Trained Convolutional Neural Networks »
Tianyuan Zhang · Zhanxing Zhu -
2019 Oral: Interpreting Adversarially Trained Convolutional Neural Networks »
Tianyuan Zhang · Zhanxing Zhu