Timezone: »
It is well-known that stochastic gradient noise (SGN) acts as implicit regularization for deep learning and is essentially important for both optimization and generalization of deep networks. Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning. However, it turned out that the injected simple random noise cannot work as well as SGN, which is anisotropic and parameter-dependent. For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach that is a powerful alternative to conventional Momentum in classic optimizers. The introduced PNM method maintains two approximate independent momentum terms. Then, we can control the magnitude of SGN explicitly by adjusting the momentum difference. We theoretically prove the convergence guarantee and the generalization advantage of PNM over Stochastic Gradient Descent (SGD). By incorporating PNM into the two conventional optimizers, SGD with Momentum and Adam, our extensive experiments empirically verified the significant advantage of the PNM-based variants over the corresponding conventional Momentum-based optimizers. Code: \url{https://github.com/zeke-xie/Positive-Negative-Momentum}.
Author Information
Zeke Xie (The University of Tokyo/RIKEN)
He obtained Ph.D. in machine learning and M.E. both from The University of Tokyo. He was fortunately supervised by Prof. Masashi Sugiyama and Prof. Issei Sato. He was also affiliated with RIKEN AIP during the Ph.D. study. Before that, he obtained Bachelor of Science from University of Science and Technology of China. His research interests generally lies in machine learning, especially including Deep Learning Theory, Imperfect Information Learning, and AI + Science.
Li Yuan (National Univerisity of Singapore)
Zhanxing Zhu (Peking University)
Masashi Sugiyama (RIKEN / The University of Tokyo)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization »
Tue. Jul 20th 04:00 -- 06:00 PM Room Virtual -
2021 Poster: Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization »
Dates n/a. Room Virtual
More from the Same Authors
-
2023 : Invited Talk 3: Masashi Sugiyama (RIKEN & UTokyo) - Data distribution shift »
Masashi Sugiyama -
2023 : Enriching Disentanglement: Definitions to Metrics »
Yivan Zhang · Masashi Sugiyama -
2023 Poster: GAT: Guided Adversarial Training with Pareto-optimal Auxiliary Tasks »
Salah GHAMIZI · Jingfeng ZHANG · Maxime Cordy · Mike Papadakis · Masashi Sugiyama · YVES LE TRAON -
2023 Poster: MonoFlow: Rethinking Divergence GANs via the Perspective of Wasserstein Gradient Flows »
Mingxuan Yi · Zhanxing Zhu · Song Liu -
2023 Poster: Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation »
Ruijiang Dong · Feng Liu · Haoang Chi · Tongliang Liu · Mingming Gong · Gang Niu · Masashi Sugiyama · Bo Han -
2023 Poster: Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits »
Jongyeong Lee · Junya Honda · Chao-Kai Chiang · Masashi Sugiyama -
2023 Poster: A Category-theoretical Meta-analysis of Definitions of Disentanglement »
Yivan Zhang · Masashi Sugiyama -
2022 Poster: Adversarial Attack and Defense for Non-Parametric Two-Sample Tests »
Xilie Xu · Jingfeng Zhang · Feng Liu · Masashi Sugiyama · Mohan Kankanhalli -
2022 Poster: Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum »
Zeke Xie · Xinrui Wang · Huishuai Zhang · Issei Sato · Masashi Sugiyama -
2022 Poster: Sparse Double Descent: Where Network Pruning Aggravates Overfitting »
Zheng He · Zeke Xie · Quanzhi Zhu · Zengchang Qin -
2022 Spotlight: Adversarial Attack and Defense for Non-Parametric Two-Sample Tests »
Xilie Xu · Jingfeng Zhang · Feng Liu · Masashi Sugiyama · Mohan Kankanhalli -
2022 Spotlight: Sparse Double Descent: Where Network Pruning Aggravates Overfitting »
Zheng He · Zeke Xie · Quanzhi Zhu · Zengchang Qin -
2022 Oral: Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum »
Zeke Xie · Xinrui Wang · Huishuai Zhang · Issei Sato · Masashi Sugiyama -
2022 Poster: To Smooth or Not? When Label Smoothing Meets Noisy Labels »
Jiaheng Wei · Hangyu Liu · Tongliang Liu · Gang Niu · Masashi Sugiyama · Yang Liu -
2022 Oral: To Smooth or Not? When Label Smoothing Meets Noisy Labels »
Jiaheng Wei · Hangyu Liu · Tongliang Liu · Gang Niu · Masashi Sugiyama · Yang Liu -
2021 Workshop: ICML Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI »
Quanshi Zhang · Tian Han · Lixin Fan · Zhanxing Zhu · Hang Su · Ying Nian Wu -
2021 Poster: Provably End-to-end Label-noise Learning without Anchor Points »
Xuefeng Li · Tongliang Liu · Bo Han · Gang Niu · Masashi Sugiyama -
2021 Poster: Learning Diverse-Structured Networks for Adversarial Robustness »
Xuefeng Du · Jingfeng Zhang · Bo Han · Tongliang Liu · Yu Rong · Gang Niu · Junzhou Huang · Masashi Sugiyama -
2021 Poster: CIFS: Improving Adversarial Robustness of CNNs via Channel-wise Importance-based Feature Selection »
Hanshu YAN · Jingfeng Zhang · Gang Niu · Jiashi Feng · Vincent Tan · Masashi Sugiyama -
2021 Poster: Maximum Mean Discrepancy Test is Aware of Adversarial Attacks »
Ruize Gao · Feng Liu · Jingfeng Zhang · Bo Han · Tongliang Liu · Gang Niu · Masashi Sugiyama -
2021 Spotlight: CIFS: Improving Adversarial Robustness of CNNs via Channel-wise Importance-based Feature Selection »
Hanshu YAN · Jingfeng Zhang · Gang Niu · Jiashi Feng · Vincent Tan · Masashi Sugiyama -
2021 Spotlight: Provably End-to-end Label-noise Learning without Anchor Points »
Xuefeng Li · Tongliang Liu · Bo Han · Gang Niu · Masashi Sugiyama -
2021 Spotlight: Learning Diverse-Structured Networks for Adversarial Robustness »
Xuefeng Du · Jingfeng Zhang · Bo Han · Tongliang Liu · Yu Rong · Gang Niu · Junzhou Huang · Masashi Sugiyama -
2021 Spotlight: Maximum Mean Discrepancy Test is Aware of Adversarial Attacks »
Ruize Gao · Feng Liu · Jingfeng Zhang · Bo Han · Tongliang Liu · Gang Niu · Masashi Sugiyama -
2021 Poster: Mediated Uncoupled Learning: Learning Functions without Direct Input-output Correspondences »
Ikko Yamane · Junya Honda · Florian YGER · Masashi Sugiyama -
2021 Poster: Pointwise Binary Classification with Pairwise Confidence Comparisons »
Lei Feng · Senlin Shu · Nan Lu · Bo Han · Miao Xu · Gang Niu · Bo An · Masashi Sugiyama -
2021 Poster: Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification »
Nan Lu · Shida Lei · Gang Niu · Issei Sato · Masashi Sugiyama -
2021 Poster: Learning from Similarity-Confidence Data »
Yuzhou Cao · Lei Feng · Yitian Xu · Bo An · Gang Niu · Masashi Sugiyama -
2021 Poster: Confidence Scores Make Instance-dependent Label-noise Learning Possible »
Antonin Berthon · Bo Han · Gang Niu · Tongliang Liu · Masashi Sugiyama -
2021 Poster: Learning Noise Transition Matrix from Only Noisy Labels via Total Variation Regularization »
Yivan Zhang · Gang Niu · Masashi Sugiyama -
2021 Spotlight: Learning from Similarity-Confidence Data »
Yuzhou Cao · Lei Feng · Yitian Xu · Bo An · Gang Niu · Masashi Sugiyama -
2021 Spotlight: Pointwise Binary Classification with Pairwise Confidence Comparisons »
Lei Feng · Senlin Shu · Nan Lu · Bo Han · Miao Xu · Gang Niu · Bo An · Masashi Sugiyama -
2021 Spotlight: Mediated Uncoupled Learning: Learning Functions without Direct Input-output Correspondences »
Ikko Yamane · Junya Honda · Florian YGER · Masashi Sugiyama -
2021 Spotlight: Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification »
Nan Lu · Shida Lei · Gang Niu · Issei Sato · Masashi Sugiyama -
2021 Oral: Learning Noise Transition Matrix from Only Noisy Labels via Total Variation Regularization »
Yivan Zhang · Gang Niu · Masashi Sugiyama -
2021 Oral: Confidence Scores Make Instance-dependent Label-noise Learning Possible »
Antonin Berthon · Bo Han · Gang Niu · Tongliang Liu · Masashi Sugiyama -
2021 Poster: Lower-Bounded Proper Losses for Weakly Supervised Classification »
Shuhei M Yoshida · Takashi Takenouchi · Masashi Sugiyama -
2021 Poster: Classification with Rejection Based on Cost-sensitive Classification »
Nontawat Charoenphakdee · Zhenghang Cui · Yivan Zhang · Masashi Sugiyama -
2021 Spotlight: Classification with Rejection Based on Cost-sensitive Classification »
Nontawat Charoenphakdee · Zhenghang Cui · Yivan Zhang · Masashi Sugiyama -
2021 Spotlight: Lower-Bounded Proper Losses for Weakly Supervised Classification »
Shuhei M Yoshida · Takashi Takenouchi · Masashi Sugiyama -
2021 Poster: Large-Margin Contrastive Learning with Distance Polarization Regularizer »
Shuo Chen · Gang Niu · Chen Gong · Jun Li · Jian Yang · Masashi Sugiyama -
2021 Spotlight: Large-Margin Contrastive Learning with Distance Polarization Regularizer »
Shuo Chen · Gang Niu · Chen Gong · Jun Li · Jian Yang · Masashi Sugiyama -
2020 Poster: Few-shot Domain Adaptation by Causal Mechanism Transfer »
Takeshi Teshima · Issei Sato · Masashi Sugiyama -
2020 Poster: Do We Need Zero Training Loss After Achieving Zero Training Error? »
Takashi Ishida · Ikko Yamane · Tomoya Sakai · Gang Niu · Masashi Sugiyama -
2020 Poster: Progressive Identification of True Labels for Partial-Label Learning »
Jiaqi Lv · Miao Xu · LEI FENG · Gang Niu · Xin Geng · Masashi Sugiyama -
2020 Poster: On Breaking Deep Generative Model-based Defenses and Beyond »
Yanzhi Chen · Renjie Xie · Zhanxing Zhu -
2020 Poster: Online Dense Subgraph Discovery via Blurred-Graph Feedback »
Yuko Kuroki · Atsushi Miyauchi · Junya Honda · Masashi Sugiyama -
2020 Poster: SIGUA: Forgetting May Make Learning with Noisy Labels More Robust »
Bo Han · Gang Niu · Xingrui Yu · QUANMING YAO · Miao Xu · Ivor Tsang · Masashi Sugiyama -
2020 Poster: Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels »
Yu-Ting Chou · Gang Niu · Hsuan-Tien (Tien) Lin · Masashi Sugiyama -
2020 Poster: Attacks Which Do Not Kill Training Make Adversarial Learning Stronger »
Jingfeng Zhang · Xilie Xu · Bo Han · Gang Niu · Lizhen Cui · Masashi Sugiyama · Mohan Kankanhalli -
2020 Poster: Accelerating the diffusion-based ensemble sampling by non-reversible dynamics »
Futoshi Futami · Issei Sato · Masashi Sugiyama -
2020 Poster: Variational Imitation Learning with Diverse-quality Demonstrations »
Voot Tangkaratt · Bo Han · Mohammad Emtiyaz Khan · Masashi Sugiyama -
2020 Poster: Informative Dropout for Robust Representation Learning: A Shape-bias Perspective »
Baifeng Shi · Dinghuai Zhang · Qi Dai · Zhanxing Zhu · Yadong Mu · Jingdong Wang -
2020 Poster: Learning with Multiple Complementary Labels »
LEI FENG · Takuo Kaneko · Bo Han · Gang Niu · Bo An · Masashi Sugiyama -
2020 Poster: On the Noisy Gradient Descent that Generalizes as SGD »
Jingfeng Wu · Wenqing Hu · Haoyi Xiong · Jun Huan · Vladimir Braverman · Zhanxing Zhu -
2020 Poster: Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks Using PAC-Bayesian Analysis »
Yusuke Tsuzuku · Issei Sato · Masashi Sugiyama -
2019 Poster: Classification from Positive, Unlabeled and Biased Negative Data »
Yu-Guan Hsieh · Gang Niu · Masashi Sugiyama -
2019 Poster: Complementary-Label Learning for Arbitrary Losses and Models »
Takashi Ishida · Gang Niu · Aditya Menon · Masashi Sugiyama -
2019 Oral: Complementary-Label Learning for Arbitrary Losses and Models »
Takashi Ishida · Gang Niu · Aditya Menon · Masashi Sugiyama -
2019 Oral: Classification from Positive, Unlabeled and Biased Negative Data »
Yu-Guan Hsieh · Gang Niu · Masashi Sugiyama -
2019 Poster: How does Disagreement Help Generalization against Label Corruption? »
Xingrui Yu · Bo Han · Jiangchao Yao · Gang Niu · Ivor Tsang · Masashi Sugiyama -
2019 Poster: Interpreting Adversarially Trained Convolutional Neural Networks »
Tianyuan Zhang · Zhanxing Zhu -
2019 Poster: The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects »
Zhanxing Zhu · Jingfeng Wu · Bing Yu · Lei Wu · Jinwen Ma -
2019 Oral: Interpreting Adversarially Trained Convolutional Neural Networks »
Tianyuan Zhang · Zhanxing Zhu -
2019 Oral: How does Disagreement Help Generalization against Label Corruption? »
Xingrui Yu · Bo Han · Jiangchao Yao · Gang Niu · Ivor Tsang · Masashi Sugiyama -
2019 Oral: The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects »
Zhanxing Zhu · Jingfeng Wu · Bing Yu · Lei Wu · Jinwen Ma -
2019 Poster: Imitation Learning from Imperfect Demonstration »
Yueh-Hua Wu · Nontawat Charoenphakdee · Han Bao · Voot Tangkaratt · Masashi Sugiyama -
2019 Poster: On Symmetric Losses for Learning from Corrupted Labels »
Nontawat Charoenphakdee · Jongyeong Lee · Masashi Sugiyama -
2019 Oral: Imitation Learning from Imperfect Demonstration »
Yueh-Hua Wu · Nontawat Charoenphakdee · Han Bao · Voot Tangkaratt · Masashi Sugiyama -
2019 Oral: On Symmetric Losses for Learning from Corrupted Labels »
Nontawat Charoenphakdee · Jongyeong Lee · Masashi Sugiyama -
2018 Poster: Classification from Pairwise Similarity and Unlabeled Data »
Han Bao · Gang Niu · Masashi Sugiyama -
2018 Oral: Classification from Pairwise Similarity and Unlabeled Data »
Han Bao · Gang Niu · Masashi Sugiyama -
2018 Poster: Does Distributionally Robust Supervised Learning Give Robust Classifiers? »
Weihua Hu · Gang Niu · Issei Sato · Masashi Sugiyama -
2018 Oral: Does Distributionally Robust Supervised Learning Give Robust Classifiers? »
Weihua Hu · Gang Niu · Issei Sato · Masashi Sugiyama -
2018 Poster: Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model »
Hideaki Imamura · Issei Sato · Masashi Sugiyama -
2018 Oral: Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model »
Hideaki Imamura · Issei Sato · Masashi Sugiyama -
2017 Poster: Learning Discrete Representations via Information Maximizing Self-Augmented Training »
Weihua Hu · Takeru Miyato · Seiya Tokui · Eiichi Matsumoto · Masashi Sugiyama -
2017 Talk: Learning Discrete Representations via Information Maximizing Self-Augmented Training »
Weihua Hu · Takeru Miyato · Seiya Tokui · Eiichi Matsumoto · Masashi Sugiyama -
2017 Poster: Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data »
Tomoya Sakai · Marthinus C du Plessis · Gang Niu · Masashi Sugiyama -
2017 Talk: Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data »
Tomoya Sakai · Marthinus C du Plessis · Gang Niu · Masashi Sugiyama