Timezone: »

Improving adversarial robustness via joint classification and multiple explicit detection classes
Sina Baharlouei · Fatemeh Sheikholeslami · Meisam Razaviyayn · Zico Kolter

This work concerns the development of deep networks that are certifiably robust to adversarial attacks. Joint robust classification-detection was recently introduced as a certified defense mechanism, where adversarial examples are either correctly classified or assigned to the abstain'' class. In this work, we show that such a provable framework can be extended to networks with multiple explicit abstain classes, where the adversarial examples are adaptively assigned to those. While naively adding multiple abstain classes can lead tomodel degeneracy'', we propose a regularization approach and a training method to counter this degeneracy by promoting full use of the multiple abstain classes. Our experiments demonstrate that the proposed approach consistently achieves favorable natural vs. robust verified accuracy tradeoff, outperforming state-of-the-art algorithms for various choices of number of abstain classes.