Skip to yearly menu bar Skip to main content

Workshop: New Frontiers in Adversarial Machine Learning

Gradient-Based Adversarial and Out-of-Distribution Detection

Jinsol Lee · Mohit Prabhushankar · Ghassan AlRegib


We propose to utilize gradients for detecting adversarial and out-of-distribution samples.We introduce confounding labels---labels that differ from normal labels seen during training---in gradient generation to probe the effective expressivity of neural networks.Gradients depict the amount of change required for a model to properly represent given inputs, providing insight into the representational power of the model established by network architectural properties as well as training data.By introducing a label of different design, we remove the dependency on ground truth labels for gradient generation during inference.We show that our gradient-based approach allows for capturing the anomaly in inputs based on the effective expressivity of the models with no hyperparameter tuning or additional processing, and outperforms state-of-the-art methods for adversarial and out-of-distribution detection.

Chat is not available.