Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

About

Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by \emph{adversarial examples} that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, \emph{feature squeezing}, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives. This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.

Weilin Xu, David Evans, Yanjun Qi• 2017

Related benchmarks

Task	Dataset	Result
Adversarial Detection	ImageNet BLIP-2	Detection Rate61	33
Adversarial Detection	ImageNet BLIP	Detection Rate57	24
Adversarial Detection	ImageNet Img2Prompt	Detection Rate51	23
LVLM image editing	Qwen-based LVLM image editing agent evaluation set noise-based injection (test)	M Avg4.66	20
Adversarial Detection	ImageNet (val)	AUROC (PGD)94.71	14
Adversarial Detection	ImageNet MiniGPT-4	Detection Rate54	12
Adversarial Detection	ImageNet UniDiffuser	Detection Rate65	12
Adversarial Detection	ImageNet UniDiffuser (test)	Detection Rate56	12
Image Classification	MNIST	Standard Accuracy97.6	8
Image Classification	MNIST (test)	Standard Accuracy97.6	8

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord