VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations

About

Adversarial attacks reveal serious flaws in deep learning models. More dangerously, these attacks preserve the original meaning and escape human recognition. Existing methods for detecting these attacks need to be trained using original/adversarial data. In this paper, we propose detection without training by voting on hard labels from predictions of transformations, namely, VoteTRANS. Specifically, VoteTRANS detects adversarial text by comparing the hard labels of input text and its transformation. The evaluation demonstrates that VoteTRANS effectively detects adversarial text across various state-of-the-art attacks, models, and datasets.

Hoang-Quoc Nguyen-Son, Seira Hidano, Kazuhide Fukushima, Shinsaku Kiyomoto, Isao Echizen• 2023

Related benchmarks

Task	Dataset	Result
Adversarial Text Detection	IMDB	F1 Score97.7	25
Adversarial Text Detection	IMDB (test)	F1 Score97.8	24
Adversarial Text Detection	AG-News	F1 Score96.7	24
Adversarial Attack	IMDB (test)	Success Rate4.3	21
Adversarial Text Detection	Yelp	F1 Score97.4	15
Adversarial Text Detection	RTMR	F1 Score86.9	11
Adversarial Text Detection	Yelp (test)	F10.974	7
Adversarial Text Detection	AG News (test)	F1 Score95.5	6
Adversarial Text Detection	RTMR (test)	F1 Score83.8	3
Adversarial Attack	AG News (test)	Attack Success Rate0.004	3

Showing 10 of 10 rows

Other info

Code

Follow for update

@wizwand_team Discord