Timezone: »

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs
Shengjie Wang · Tianyi Zhou · Jeff Bilmes

Tue Jun 11 06:30 PM -- 09:00 PM (PDT) @ Pacific Ballroom #29

We discuss three novel insights about dropout for DNNs with ReLUs: 1) dropout encourages each local linear piece of a DNN to be trained on data points from nearby regions; 2) the same dropout rate results in different (effective) deactivation rates for layers with different portions of ReLU-deactivated neurons; and 3) the rescaling factor of dropout causes a normalization inconsistency between training and test when used together with batch normalization. The above leads to three simple but nontrivial modifications resulting in our method ``jumpout.'' Jumpout samples the dropout rate from a monotone decreasing distribution (e.g., the right half of a Gaussian), so each local linear piece is trained, with high probability, to work better for data points from nearby than more distant regions. Jumpout moreover adaptively normalizes the dropout rate at each layer and every training batch, so the effective deactivation rate on the activated neurons is kept the same. Furthermore, it rescales the outputs for a better trade-off that keeps both the variance and mean of neurons more consistent between training and test phases, thereby mitigating the incompatibility between dropout and batch normalization. Jumpout significantly improves the performance of different neural nets on CIFAR10, CIFAR100, Fashion-MNIST, STL10, SVHN, ImageNet-1k, etc., while introducing negligible additional memory and computation costs.

Author Information

Shengjie Wang ("University of Washington, Seattle")
Tianyi Zhou (University of Washington)
Tianyi Zhou

Tianyi Zhou is a tenure-track assistant professor of Computer Science and UMIACS at the University of Maryland, College Park. He received his Ph.D. from the University of Washington, Seattle. His research interests are machine learning, optimization, and natural language processing. His recent works focus on curriculum learning, hybrid human-artificial intelligence, trustworthy and robust AI, plasticity-stability trade-off in ML, large language and multi-modality models, reinforcement learning, federated learning, and meta-learning. He has published ~90 papers at NeurIPS, ICML, ICLR, AISTATS, ACL, EMNLP, NAACL, COLING, CVPR, KDD, ICDM, AAAI, IJCAI, ISIT, Machine Learning (Springer), IEEE TIP/TNNLS/TKDE, etc. He is the recipient of the Best Student Paper Award at ICDM 2013 and the 2020 IEEE TCSC Most Influential Paper Award. He served as an SPC member or area chair in AAAI, IJCAI, KDD, WACV, etc. Tianyi was a visiting research scientist at Google and a research intern at Microsoft Research Redmond and Yahoo! Labs.

Jeff Bilmes (UW)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors