Timezone: »

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs
Shengjie Wang · Tianyi Zhou · Jeff Bilmes

Tue Jun 11 05:10 PM -- 05:15 PM (PDT) @ Hall A

Dropout is a simple and effective way to improve the generalization performance of deep neural networks (DNNs) and prevent overfitting. This paper discusses three novel observations about dropout when applied to DNNs with rectified linear unit (ReLU): 1) dropout encourages each local linear model of a DNN to be trained on data points from nearby regions; 2) applying the same dropout rate to different layers can result in significantly different (effective) deactivation rates; and 3) when batch normalization is also used, the rescaling factor of dropout causes a normalization inconsistency between training and testing. The above leads to three simple but nontrivial dropout modifications resulting in our proposed method ``jumpout.'' Jumpout samples the dropout rate from a monotone decreasing distribution (e.g., the right half of a Gaussian), so each local linear model is trained, with high probability, to work better for data points from nearby than from more distant regions. Jumpout moreover adaptively normalizes the dropout rate at each layer and every training batch, so the effective deactivation rate applied to the activated neurons are kept the same. Furthermore, it rescales the outputs for a better trade-off that keeps both the variance and mean of neurons more consistent between training and test phases, thereby mitigating the incompatibility between dropout and batch normalization. Jumpout shows significantly improved performance on CIFAR10, CIFAR100, Fashion-MNIST, STL10, SVHN, ImageNet-1k, etc., while introducing negligible additional memory and computation costs.

Author Information

Shengjie Wang ("University of Washington, Seattle")
Tianyi Zhou (University of Washington)

Tianyi Zhou is currently a PhD student at Paul G. Allen school of Computer Science and Engineering, University of Washington. He is supervised by Prof. Jeff Bilmes and Prof. Carlos Guestrin. He published ~50 papers at NeurIPS, ICML, ICLR, AISTATS, NAACL, KDD, ICDM, IJCAI, AAAI, ISIT, Machine Learning Journal, IEEE TIP, IEEE TNNLS, IEEE TKDE, etc, with ~1700 citations. He is the recipient of the Best student paper award at ICDM 2013.

Jeff Bilmes (UW)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors