Timezone: »

Defense against backdoor attacks via robust covariance estimation
Jonathan Hayase · Weihao Kong · Raghav Somani · Sewoong Oh

Thu Jul 22 07:30 AM -- 07:35 AM (PDT) @ None

Modern machine learning increasingly requires training on a large collection of data from multiple sources, not all of which can be trusted. A particularly frightening scenario is when a small fraction of corrupted data changes the behavior of the trained model when triggered by an attacker-specified watermark. Such a compromised model will be deployed unnoticed as the model is accurate otherwise. There has been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones. However, these methods require a significant fraction of the data to be corrupted, in order to have strong enough signal for detection. We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data. This defense is able to completely remove backdoors whenever the benchmark backdoor attacks are successful, even in regimes where previous methods have no hope for detecting poisoned examples.

Author Information

Jonathan Hayase (University of Washington)
Weihao Kong (University of Washington)
Raghav Somani (University of Washington)

I am broadly interested in the aspects of Large-Scale Optimization and Probability theory that arise in fundamental Machine Learning.

Sewoong Oh (University of Washington)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors