Universal adversarial perturbations

About

Given a state-of-the-art deep neural network classifier, we show the existence of a universal (image-agnostic) and very small perturbation vector that causes natural images to be misclassified with high probability. We propose a systematic algorithm for computing universal perturbations, and show that state-of-the-art deep neural networks are highly vulnerable to such perturbations, albeit being quasi-imperceptible to the human eye. We further empirically analyze these universal perturbations and show, in particular, that they generalize very well across neural networks. The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers. It further outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images.

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard• 2016

Related benchmarks

Task	Dataset	Result
Universal Targeted Adversarial Attack	Unseen (test)	KMRa40.2	18
Universal Targeted Adversarial Attack	Seen Samples (Used for Optimization) (train)	KMRa14.9	18
Adversarial Attack	Cityscapes (test)	ASR8.13	12
Adversarial Attack	SA-1B (test)	ASR5.28	12
Adversarial Attack	ADE20K (test)	ASR1.62	11
Adversarial Attack	COCO (test)	ASR47	10
Attack Success Rate	PandaGPT Image Modality	Exact ASR0.00e+0	8
Attack Success Rate	PandaGPT Audio Modality	Exact ASR0.00e+0	3
Attack Success Rate	PandaGPT Text Modality	Exact ASR97	3

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord