Timezone: »

Not All Poisons are Created Equal: Robust Training against Data Poisoning
Yu Yang · Tian Yu Liu · Baharan Mirzasoleiman

@ None #None

Data poisoning causes misclassification of test time samples by injecting maliciously crafted samples in the training data. Existing defenses are often effective only against a specific type of targeted attack, significantly degrade the generalization performance, are prohibitive for standard deep learning pipelines. In this work, we propose an efficient defense mechanism that significantly reduces the success rate of various data poisoning attacks, and provides theoretical guarantees for the performance of the model. We make the following observations: (i) targeted attacks add bounded perturbations to a randomly selected subset of training data to match the gradient of the target; (ii) under bounded perturbations, only a small number of poisons can be optimized to have a gradient that is close enough to that of the target and make the attack successful; (iii) such examples move away from their original class and get isolated in the gradient space. We show that training on large gradient clusters of each class can successfully eliminate the effective poisons, and guarantee similar training dynamics to that of training on the full data. Our extensive experiments show that our method significantly decreases the success rate of the state-of-the-art targeted attacks, including Gradient Matching and Bullseye Poly-tope, and easily scales to large datasets.

Author Information

Yu Yang (University of California, Los Angeles)
Tian Yu Liu (UCLA)
Baharan Mirzasoleiman (Stanford University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors