Deep neural networks have become the state-of-the-art models in numerous machine learning tasks. However, general guidance to network architecture design is still missing. In our work, we bridge deep neural network design with numerical differential equations. We show that many effective networks, such as ResNet, PolyNet, FractalNet and RevNet, can be interpreted as different numerical discretizations of differential equations. This finding brings us a brand new perspective on the design of effective deep architectures. We can take advantage of the rich knowledge in numerical analysis to guide us in designing new and potentially more effective deep networks. As an example, we propose a linear multi-step architecture (LM-architecture) which is inspired by the linear multi-step method solving ordinary differential equations. The LM-architecture is an effective structure that can be used on any ResNet-like networks. In particular, we demonstrate that LM-ResNet and LM-ResNeXt (i.e. the networks obtained by applying the LM-architecture on ResNet and ResNeXt respectively) can achieve noticeably higher accuracy than ResNet and ResNeXt on both CIFAR and ImageNet with comparable numbers of trainable parameters. In particular, on both CIFAR and ImageNet, LM-ResNet/LM-ResNeXt can significantly compress (>50%) the original networks while maintaining a similar performance. This can be explained mathematically using the concept of modified equation from numerical analysis. Last but not least, we also establish a connection between stochastic control and noise injection in the training process which helps to improve generalization of the networks. Furthermore, by relating stochastic training strategy with stochastic dynamic system, we can easily apply stochastic training to the networks with the LM-architecture. As an example, we introduced stochastic depth to LM-ResNet and achieve significant improvement over the original LM-ResNet on CIFAR10.
Yiping Lu (Peking University)
Aoxiao Zhong (Zhejiang University)
Quanzheng Li (Mass General Hospital, Harvard Medical School)
Bin Dong (Peking University)
I received my B.S. from Peking University in 2003, M.Sc from National University of Singapore in 2005 and Ph.D from University of California Los Angeles (UCLA) in 2009. Then I spent 2 years in University of California San Diego (UCSD) as a visiting assistant professor. I was a tenure-track assistant professor at University of Arizona since 2011 and joined Peking University as an associate professor in 2014. I received the Qiu Shi Outstanding Young Scholar award in 2014 and the award of the Project of Thousand Youth Talents of China in 2015. My research interest is in mathematical modeling and computations in imaging and data science, which includes (but not limited to) biological and medical imaging and image analysis , image guided diagnosis and treatment of disease, (semi-)supervised learning. A special feature of my research is blending different branches in mathematics which includes: bridging wavelet frame theory, variational techniques and nonlinear PDEs; combining sparse approximation and partial differential equations with machine and deep learning. We are working on projects aiming at addressing new and fascinating connections among these subjects, which not only leads to new understandings of the subjects themselves, but also gives rise to new and effective mathematical and computational tools for imaging/data science.
Related Events (a corresponding poster, oral, or spotlight)
2018 Oral: Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations »
Thu Jul 12th 02:30 -- 02:40 PM Room Victoria