Design and Evaluation of a Multi-Domain Trojan Detection Method on Deep Neural Networks

About

This work corroborates a run-time Trojan detection method exploiting STRong Intentional Perturbation of inputs, is a multi-domain Trojan detection defence across Vision, Text and Audio domains---thus termed as STRIP-ViTA. Specifically, STRIP-ViTA is the first confirmed Trojan detection method that is demonstratively independent of both the task domain and model architectures. We have extensively evaluated the performance of STRIP-ViTA over: i) CIFAR10 and GTSRB datasets using 2D CNNs, and a public third party Trojaned model for vision tasks; ii) IMDB and consumer complaint datasets using both LSTM and 1D CNNs for text tasks; and speech command dataset using both 1D CNNs and 2D CNNs for audio tasks. Experimental results based on 28 tested Trojaned models demonstrate that STRIP-ViTA performs well across all nine architectures and five datasets. In general, STRIP-ViTA can effectively detect Trojan inputs with small false acceptance rate (FAR) with an acceptable preset false rejection rate (FRR). In particular, for vision tasks, we can always achieve a 0% FRR and FAR. By setting FRR to be 3%, average FAR of 1.1% and 3.55% are achieved for text and audio tasks, respectively. Moreover, we have evaluated and shown the effectiveness of STRIP-ViTA against a number of advanced backdoor attacks whilst other state-of-the-art methods lose effectiveness in front of one or all of these advanced backdoor attacks.

Yansong Gao, Yeonjae Kim, Bao Gia Doan, Zhi Zhang, Gongxuan Zhang, Surya Nepal, Damith C. Ranasinghe, Hyoungshick Kim• 2019

Related benchmarks

Task	Dataset	Result
Sentiment Classification	SST2 (test)	Accuracy89.45	233
Sentiment Classification	IMDB (test)	--	144
Poisoned sample detection	TrojAI round 6 (test)	Precision0.917	96
Text Classification	Subj	CA (%)0.967	94
Topic Classification	AG's News	ASR97.58	70
Backdoor Defense	SST-2	CACC91.39	65
Targeted attack detection	Alpaca OnlyTarget Short	TPR100	56
Detection Efficiency	Alpaca OnlyTarget Long (malicious)	ATGR5.353	56
Detection Efficiency	Alpaca OnlyTarget Long (benign)	ATGR3.778	56
Targeted attack detection	Alpaca OnlyTarget Medium	TPR100	56

Showing 10 of 40 rows

Other info

Follow for update

@wizwand_team Discord