Timezone: »
Despite their overwhelming capacity to overfit, deep neural networks trained by specific optimization algorithms tend to generalize relatively well to unseen data. Recently, researchers explained it by investigating the implicit bias of optimization algorithms. A remarkable progress is the work (Lyu & Li, 2019), which proves gradient descent (GD) maximizes the margin of homogeneous deep neural networks. Except the first-order optimization algorithms like GD, adaptive algorithms such as AdaGrad, RMSProp and Adam are popular owing to their rapid training process. Mean-while, numerous works have provided empirical evidence that adaptive methods may suffer from poor generalization performance. However, theoretical explanation for the generalization of adaptive optimization algorithms is still lacking. In this paper, we study the implicit bias of adaptive optimization algorithms on homogeneous neural networks. In particular, we study the convergent direction of parameters when they are optimizing the logistic loss. We prove that the convergent direction of Adam and RMSProp is the same as GD, while for AdaGrad, the convergent direction depends on the adaptive conditioner. Technically, we provide a unified framework to analyze convergent direction of adaptive optimization algorithms by constructing novel and nontrivial adaptive gradient flow and surrogate margin. The theoretical findings explain the superiority on generalization of exponential moving average strategy that is adopted by RMSProp and Adam. To the best of knowledge, it is the first work to study the convergent direction of adaptive optimizations on non-linear deep neural networks
Author Information
Bohan Wang (Microsoft Research Asia)
Qi Meng (Microsoft)
Wei Chen (Microsoft Research)
Tie-Yan Liu (Microsoft Research Asia)
Tie-Yan Liu is a principal researcher of Microsoft Research Asia, leading the research on artificial intelligence and machine learning. He is very well known for his pioneer work on learning to rank and computational advertising, and his recent research interests include deep learning, reinforcement learning, and distributed machine learning. Many of his technologies have been transferred to Microsoft’s products and online services (such as Bing, Microsoft Advertising, and Azure), and open-sourced through Microsoft Cognitive Toolkit (CNTK), Microsoft Distributed Machine Learning Toolkit (DMTK), and Microsoft Graph Engine. On the other hand, he has been actively contributing to academic communities. He is an adjunct/honorary professor at Carnegie Mellon University (CMU), University of Nottingham, and several other universities in China. His papers have been cited for tens of thousands of times in refereed conferences and journals. He has won quite a few awards, including the best student paper award at SIGIR (2008), the most cited paper award at Journal of Visual Communications and Image Representation (2004-2006), the research break-through award (2012) and research-team-of-the-year award (2017) at Microsoft Research, and Top-10 Springer Computer Science books by Chinese authors (2015), and the most cited Chinese researcher by Elsevier (2017). He has been invited to serve as general chair, program committee chair, local chair, or area chair for a dozen of top conferences including SIGIR, WWW, KDD, ICML, NIPS, IJCAI, AAAI, ACL, ICTIR, as well as associate editor of ACM Transactions on Information Systems, ACM Transactions on the Web, and Neurocomputing. Tie-Yan Liu is a fellow of the IEEE, a distinguished member of the ACM, and a vice chair of the CIPS information retrieval technical committee.
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks »
Thu. Jul 22nd 04:00 -- 06:00 AM Room Virtual
More from the Same Authors
-
2022 Poster: SE(3) Equivariant Graph Neural Networks with Complete Local Frames »
weitao du · He Zhang · Yuanqi Du · Qi Meng · Wei Chen · Nanning Zheng · Bin Shao · Tie-Yan Liu -
2022 Spotlight: SE(3) Equivariant Graph Neural Networks with Complete Local Frames »
weitao du · He Zhang · Yuanqi Du · Qi Meng · Wei Chen · Nanning Zheng · Bin Shao · Tie-Yan Liu -
2022 Poster: Analyzing and Mitigating Interference in Neural Architecture Search »
Jin Xu · Xu Tan · Kaitao Song · Renqian Luo · Yichong Leng · Tao Qin · Tie-Yan Liu · Jian Li -
2022 Poster: Supervised Off-Policy Ranking »
Yue Jin · Yue Zhang · Tao Qin · Xudong Zhang · Jian Yuan · Houqiang Li · Tie-Yan Liu -
2022 Spotlight: Supervised Off-Policy Ranking »
Yue Jin · Yue Zhang · Tao Qin · Xudong Zhang · Jian Yuan · Houqiang Li · Tie-Yan Liu -
2022 Spotlight: Analyzing and Mitigating Interference in Neural Architecture Search »
Jin Xu · Xu Tan · Kaitao Song · Renqian Luo · Yichong Leng · Tao Qin · Tie-Yan Liu · Jian Li -
2021 Poster: Large Scale Private Learning via Low-rank Reparametrization »
Da Yu · Huishuai Zhang · Wei Chen · Jian Yin · Tie-Yan Liu -
2021 Spotlight: Large Scale Private Learning via Low-rank Reparametrization »
Da Yu · Huishuai Zhang · Wei Chen · Jian Yin · Tie-Yan Liu -
2021 Poster: Temporally Correlated Task Scheduling for Sequence Learning »
Xueqing Wu · Lewen Wang · Yingce Xia · Weiqing Liu · Lijun Wu · Shufang Xie · Tao Qin · Tie-Yan Liu -
2021 Poster: GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training »
Tianle Cai · Shengjie Luo · Keyulu Xu · Di He · Tie-Yan Liu · Liwei Wang -
2021 Spotlight: GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training »
Tianle Cai · Shengjie Luo · Keyulu Xu · Di He · Tie-Yan Liu · Liwei Wang -
2021 Spotlight: Temporally Correlated Task Scheduling for Sequence Learning »
Xueqing Wu · Lewen Wang · Yingce Xia · Weiqing Liu · Lijun Wu · Shufang Xie · Tao Qin · Tie-Yan Liu -
2021 : Privacy in learning: Basics and the interplay »
Huishuai Zhang · Wei Chen -
2020 Poster: On Layer Normalization in the Transformer Architecture »
Ruibin Xiong · Yunchang Yang · Di He · Kai Zheng · Shuxin Zheng · Chen Xing · Huishuai Zhang · Yanyan Lan · Liwei Wang · Tie-Yan Liu -
2020 Poster: Sequence Generation with Mixed Representations »
Lijun Wu · Shufang Xie · Yingce Xia · Yang Fan · Jian-Huang Lai · Tao Qin · Tie-Yan Liu -
2019 Poster: Efficient Training of BERT by Progressively Stacking »
Linyuan Gong · Di He · Zhuohan Li · Tao Qin · Liwei Wang · Tie-Yan Liu -
2019 Oral: Efficient Training of BERT by Progressively Stacking »
Linyuan Gong · Di He · Zhuohan Li · Tao Qin · Liwei Wang · Tie-Yan Liu -
2018 Poster: Towards Binary-Valued Gates for Robust LSTM Training »
Zhuohan Li · Di He · Fei Tian · Wei Chen · Tao Qin · Liwei Wang · Tie-Yan Liu -
2018 Oral: Towards Binary-Valued Gates for Robust LSTM Training »
Zhuohan Li · Di He · Fei Tian · Wei Chen · Tao Qin · Liwei Wang · Tie-Yan Liu -
2017 Poster: Asynchronous Stochastic Gradient Descent with Delay Compensation »
Shuxin Zheng · Qi Meng · Taifeng Wang · Wei Chen · Nenghai Yu · Zhiming Ma · Tie-Yan Liu -
2017 Talk: Asynchronous Stochastic Gradient Descent with Delay Compensation »
Shuxin Zheng · Qi Meng · Taifeng Wang · Wei Chen · Nenghai Yu · Zhiming Ma · Tie-Yan Liu -
2017 Poster: Dual Supervised Learning »
Yingce Xia · Tao Qin · Wei Chen · Jiang Bian · Nenghai Yu · Tie-Yan Liu -
2017 Talk: Dual Supervised Learning »
Yingce Xia · Tao Qin · Wei Chen · Jiang Bian · Nenghai Yu · Tie-Yan Liu