Oral
Using Pre-Training Can Improve Model Robustness and Uncertainty
Dan Hendrycks · Kimin Lee · Mantas Mazeika
Tuning a pre-trained network is commonly thought to improve data efficiency. However, Kaiming He et al. (2018) have called into question the utility of pre-training by showing that training from scratch can often yield similar performance, should the model train long enough. We show that although pre-training may not improve performance on traditional classification metrics, it does provide large benefits to model robustness and uncertainty. Through extensive experiments on label corruption, class imbalance, adversarial examples, out-of-distribution detection, and confidence calibration, we demonstrate large gains from pre-training and complementary effects with task-specific methods. Results include a 30% relative improvement in label noise robustness and a 10% absolute improvement in adversarial robustness on both CIFAR-10 and CIFAR-100. In some cases, using pre-training without task-specific methods surpasses the state-of-the-art, highlighting the importance of using pre-training when evaluating future methods on robustness and uncertainty tasks.