Timezone: »

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
Zeke Xie · Li Yuan · Zhanxing Zhu · Masashi Sugiyama

Tue Jul 20 09:00 AM -- 11:00 AM (PDT) @ Virtual #None

It is well-known that stochastic gradient noise (SGN) acts as implicit regularization for deep learning and is essentially important for both optimization and generalization of deep networks. Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning. However, it turned out that the injected simple random noise cannot work as well as SGN, which is anisotropic and parameter-dependent. For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach that is a powerful alternative to conventional Momentum in classic optimizers. The introduced PNM method maintains two approximate independent momentum terms. Then, we can control the magnitude of SGN explicitly by adjusting the momentum difference. We theoretically prove the convergence guarantee and the generalization advantage of PNM over Stochastic Gradient Descent (SGD). By incorporating PNM into the two conventional optimizers, SGD with Momentum and Adam, our extensive experiments empirically verified the significant advantage of the PNM-based variants over the corresponding conventional Momentum-based optimizers. Code: \url{https://github.com/zeke-xie/Positive-Negative-Momentum}.

Author Information

Zeke Xie (The University of Tokyo/RIKEN)

I am a machine learning Ph.D. student at Sugiyama Lab and Issei Sato Lab, The University of Tokyo. I am jointly supervised by Prof. Masashi Sugiyama and Prof. Issei Sato. I obtained Bachelor of Science from University of Science and Technology of China and Master of Engineering from The University of Tokyo. I was also fortunate enough to collaborate with Prof. Dacheng Tao, Dr. Huishuai Zhang , and Prof. Zhanxing Zhu. My research interests mainly include Deep Learning Theory, Weakly Supervised Learning, and Trustworthy AI (including Robustness and Uncertainty).

Li Yuan (National Univerisity of Singapore)
Zhanxing Zhu (Peking University)
Masashi Sugiyama (RIKEN / The University of Tokyo)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors