Timezone: »

Entropy Weighted Adversarial Training
Minseon Kim · Jihoon Tack · Jinwoo Shin · Sung Ju Hwang

Adversarial training methods, which minimizes the loss of adversarially-perturbed training examples, have been extensively studied as a solution to improve the robustness of the deep neural networks. However, most adversarial training methods treat all training examples equally, while each example may have a different impact on the model's robustness during the course of training. Recent works have exploited such unequal importance of adversarial samples to model's robustness, which has been shown to obtain high robustness against untargeted PGD attacks. However, we empirically observe that they make the feature spaces of adversarial samples across different classes overlap, and thus yield more high-entropy samples whose labels could be easily flipped. This makes them more vulnerable to targeted adversarial perturbations. Moreover, to address such limitations, we propose a simple yet effective weighting scheme, Entropy-Weighted Adversarial Training (EWAT), which weighs the loss for each adversarial training example proportionally to the entropy of its predicted distribution, to focus on examples whose labels are more uncertain. We validate our method on multiple benchmark datasets and show that it achieves an impressive increase of robust accuracy.

Author Information

Minseon Kim (Korea Advanced Institute of Science and Technology)
Jihoon Tack (KAIST)
Jinwoo Shin (KAIST)
Sung Ju Hwang (UNIST)

More from the Same Authors