Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Detecting Adversarial Samples from Artifacts

About

Deep neural networks (DNNs) are powerful nonlinear architectures that are known to be robust to random perturbations of the input. However, these models are vulnerable to adversarial perturbations--small input changes crafted explicitly to fool the model. In this paper, we ask whether a DNN can distinguish adversarial samples from their normal and noisy counterparts. We investigate model confidence on adversarial samples by looking at Bayesian uncertainty estimates, available in dropout neural networks, and by performing density estimation in the subspace of deep features learned by the model. The result is a method for implicit adversarial detection that is oblivious to the attack algorithm. We evaluate this method on a variety of standard datasets including MNIST and CIFAR-10 and show that it generalizes well across different architectures and attacks. Our findings report that 85-93% ROC-AUC can be achieved on a number of standard classification tasks with a negative class that consists of both normal and noisy samples.

Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, Andrew B. Gardner• 2017

Related benchmarks

TaskDatasetResultRank
Adversarial DetectionCIFAR-10 clean (test)
TPR-9585.51
23
Adversarial Attack DetectionCIFAR-100 1.0 (Clean)
TPR-9562.04
16
Adversarial Attack DetectionCIFAR-100 PGD-10 (l_inf, 8/255) 1.0
TPR-9532.59
16
Adversarial Attack DetectionCIFAR-100 PGD-10 (l_inf, 16/255) 1.0
TPR-9518.19
16
Adversarial Attack DetectionCIFAR-100 PGD-10 (l_2, 128/255) 1.0
TPR-950.4166
16
Adversarial Robustness (Rejection)CIFAR-10 PGD-100, l_inf, 16/255 (test)
TPR-9531.97
15
Adversarial Robustness (Rejection)CIFAR-10 PGD-100, l_inf, 8/255 (test)
TPR-9553.12
15
Adversarial Robustness (Rejection)CIFAR-10 PGD-100, l_2, 128/255 (test)
TPR-9564.6
15
Adversarial Attack DetectionCIFAR-10 known attack 1.0 (test)
AUROC (FGSM)90.13
12
Adversarial DetectionCIFAR-10 PGD-10, l∞, ε=16/255 (test)
TPR-9534.87
8
Showing 10 of 16 rows

Other info

Follow for update