Poster

Defense against backdoor attacks via robust covariance estimation

Jonathan Hayase · Weihao Kong · Raghav Somani · Sewoong Oh

Keywords: Adversarial Examples Algorithms

2021 Poster

Paper PDF [ Paper ] [ Visit Poster at Spot B1 in Virtual World ]

Abstract

Modern machine learning increasingly requires training on a large collection of data from multiple sources, not all of which can be trusted. A particularly frightening scenario is when a small fraction of corrupted data changes the behavior of the trained model when triggered by an attacker-specified watermark. Such a compromised model will be deployed unnoticed as the model is accurate otherwise. There has been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones. However, these methods require a significant fraction of the data to be corrupted, in order to have strong enough signal for detection. We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data. This defense is able to completely remove backdoors whenever the benchmark backdoor attacks are successful, even in regimes where previous methods have no hope for detecting poisoned examples.

Video

Chat is not available.