Timezone: »
Dropout is a simple and effective way to improve the generalization performance of deep neural networks (DNNs) and prevent overfitting. This paper discusses three novel observations about dropout when applied to DNNs with rectified linear unit (ReLU): 1) dropout encourages each local linear model of a DNN to be trained on data points from nearby regions; 2) applying the same dropout rate to different layers can result in significantly different (effective) deactivation rates; and 3) when batch normalization is also used, the rescaling factor of dropout causes a normalization inconsistency between training and testing. The above leads to three simple but nontrivial dropout modifications resulting in our proposed method ``jumpout.'' Jumpout samples the dropout rate from a monotone decreasing distribution (e.g., the right half of a Gaussian), so each local linear model is trained, with high probability, to work better for data points from nearby than from more distant regions. Jumpout moreover adaptively normalizes the dropout rate at each layer and every training batch, so the effective deactivation rate applied to the activated neurons are kept the same. Furthermore, it rescales the outputs for a better tradeoff that keeps both the variance and mean of neurons more consistent between training and test phases, thereby mitigating the incompatibility between dropout and batch normalization. Jumpout shows significantly improved performance on CIFAR10, CIFAR100, FashionMNIST, STL10, SVHN, ImageNet1k, etc., while introducing negligible additional memory and computation costs.
Author Information
Shengjie Wang ("University of Washington, Seattle")
Tianyi Zhou (University of Washington)
Tianyi Zhou is a tenuretrack assistant professor of Computer Science and UMIACS at the University of Maryland, College Park. He received his Ph.D. from the University of Washington, Seattle. His research interests are machine learning, optimization, and natural language processing. His recent works focus on curriculum learning, hybrid humanartificial intelligence, trustworthy and robust AI, plasticitystability tradeoff in ML, large language and multimodality models, reinforcement learning, federated learning, and metalearning. He has published ~90 papers at NeurIPS, ICML, ICLR, AISTATS, ACL, EMNLP, NAACL, COLING, CVPR, KDD, ICDM, AAAI, IJCAI, ISIT, Machine Learning (Springer), IEEE TIP/TNNLS/TKDE, etc. He is the recipient of the Best Student Paper Award at ICDM 2013 and the 2020 IEEE TCSC Most Influential Paper Award. He served as an SPC member or area chair in AAAI, IJCAI, KDD, WACV, etc. Tianyi was a visiting research scientist at Google and a research intern at Microsoft Research Redmond and Yahoo! Labs.
Jeff Bilmes (UW)
Related Events (a corresponding poster, oral, or spotlight)

2019 Poster: Jumpout : Improved Dropout for Deep Neural Networks with ReLUs »
Wed. Jun 12th 01:30  04:00 AM Room Pacific Ballroom #29
More from the Same Authors

2021 : Tighter mDPP Coreset Sample Complexity Bounds »
Gantavya Bhatt · Jeff Bilmes 
2022 : Vote for Nearest Neighbors MetaPruning of SelfSupervised Networks »
Haiyan Zhao · Tianyi Zhou · Guodong Long · Jing Jiang · Chengqi Zhang 
2022 : Federated Learning from PreTrained Models: A Contrastive Learning Approach »
Yue Tan · Yue Tan · Guodong Long · Guodong Long · Jie Ma · Jie Ma · LU LIU · LU LIU · Tianyi Zhou · Tianyi Zhou · Jing Jiang · Jing Jiang 
2023 : Accelerating Batch Active Learning Using Continual Learning Techniques »
Gantavya Bhatt · Arnav M Das · · Rui Yang · Vianne Gao · Jeff Bilmes 
2023 : Taming Smallsample Bias in Lowbudget Active Learning »
Linxin Song · Jieyu Zhang · Xiaotian Lu · Tianyi Zhou 
2023 Poster: Structured Cooperative Learning with Graphical Model Priors »
Shuangtong Li · Tianyi Zhou · Xinmei Tian · Dacheng Tao 
2023 Poster: Does Continual Learning Equally Forget All Parameters? »
Haiyan Zhao · Tianyi Zhou · Guodong Long · Jing Jiang · Chengqi Zhang 
2023 Poster: Continual Task Allocation in MetaPolicy Network via Sparse Prompting »
Yijun Yang · Tianyi Zhou · Jing Jiang · Guodong Long · Yuhui Shi 
2022 : Vote for Nearest Neighbors MetaPruning of SelfSupervised Networks »
Haiyan Zhao · Tianyi Zhou · Guodong Long · Jing Jiang · Chengqi Zhang 
2022 : Does Continual Learning Equally Forget All Parameters? »
Haiyan Zhao · Tianyi Zhou · Guodong Long · Jing Jiang · Chengqi Zhang 
2021 : Tighter mDPP Coreset Sample Complexity Bounds »
Jeff Bilmes · Gantavya Bhatt 
2021 : More Information, Less Data »
Jeff Bilmes · Jeff Bilmes 
2021 : Introduction by the Organizers »
Abir De · Rishabh Iyer · Ganesh Ramakrishnan · Jeff Bilmes 
2021 Workshop: Subset Selection in Machine Learning: From Theory to Applications »
Rishabh Iyer · Abir De · Ganesh Ramakrishnan · Jeff Bilmes 
2020 Poster: Coresets for Dataefficient Training of Machine Learning Models »
Baharan Mirzasoleiman · Jeff Bilmes · Jure Leskovec 
2020 Poster: TimeConsistent SelfSupervision for SemiSupervised Learning »
Tianyi Zhou · Shengjie Wang · Jeff Bilmes 
2019 : Jeff Bilmes: Deep Submodular Synergies »
Jeff Bilmes 
2019 Poster: Bias Also Matters: Bias Attribution for Deep Neural Network Explanation »
Shengjie Wang · Tianyi Zhou · Jeff Bilmes 
2019 Oral: Bias Also Matters: Bias Attribution for Deep Neural Network Explanation »
Shengjie Wang · Tianyi Zhou · Jeff Bilmes 
2019 Poster: Combating Label Noise in Deep Learning using Abstention »
Sunil Thulasidasan · Tanmoy Bhattacharya · Jeff Bilmes · Gopinath Chennupati · Jamal MohdYusof 
2019 Oral: Combating Label Noise in Deep Learning using Abstention »
Sunil Thulasidasan · Tanmoy Bhattacharya · Jeff Bilmes · Gopinath Chennupati · Jamal MohdYusof 
2018 Poster: Constrained Interacting Submodular Groupings »
Andrew Cotter · Mahdi Milani Fard · Seungil You · Maya Gupta · Jeff Bilmes 
2018 Poster: Greed is Still Good: Maximizing Monotone Submodular+Supermodular (BP) Functions »
Wenruo Bai · Jeff Bilmes 
2018 Oral: Constrained Interacting Submodular Groupings »
Andrew Cotter · Mahdi Milani Fard · Seungil You · Maya Gupta · Jeff Bilmes 
2018 Oral: Greed is Still Good: Maximizing Monotone Submodular+Supermodular (BP) Functions »
Wenruo Bai · Jeff Bilmes