Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Rethinking the Reverse-engineering of Trojan Triggers

About

Deep Neural Networks are vulnerable to Trojan (or backdoor) attacks. Reverse-engineering methods can reconstruct the trigger and thus identify affected models. Existing reverse-engineering methods only consider input space constraints, e.g., trigger size in the input space. Expressly, they assume the triggers are static patterns in the input space and fail to detect models with feature space triggers such as image style transformations. We observe that both input-space and feature-space Trojans are associated with feature space hyperplanes. Based on this observation, we design a novel reverse-engineering method that exploits the feature space constraint to reverse-engineer Trojan triggers. Results on four datasets and seven different attacks demonstrate that our solution effectively defends both input-space and feature-space Trojans. It outperforms state-of-the-art reverse-engineering methods and other types of defenses in both Trojaned model detection and mitigation tasks. On average, the detection accuracy of our method is 93\%. For Trojan mitigation, our method can reduce the ASR (attack success rate) to only 0.26\% with the BA (benign accuracy) remaining nearly unchanged. Our code can be found at https://github.com/RU-System-Software-and-Security/FeatureRE.

Zhenting Wang, Kai Mei, Hailun Ding, Juan Zhai, Shiqing Ma• 2022

Related benchmarks

TaskDatasetResultRank
Backdoor DetectionCIFAR-10
Bd. Rate20
120
Backdoor DefenseCIFAR-10
Attack Success Rate99.79
78
Trojan DetectionCIFAR-10
True Positives (TP)20
22
Trojan mitigationCIFAR-10 (test)
Benign Accuracy91.79
12
Trojan DefenseMNIST
Benign Accuracy99.2
11
Backdoor DefenseCIFAR-10 WaNet attack
Backdoor Accuracy93.67
8
Backdoor DefenseCIFAR-10 Bpp attack
Backdoor Accuracy94.21
8
Backdoor DefenseCIFAR-10 IAD attack
Backdoor Accuracy92.73
8
Trojan DetectionGTSRB
True Positives (TP)8
5
Backdoor DefenseCIFAR-10 Blend attack
Backdoor Accuracy93.2
5
Showing 10 of 13 rows

Other info

Code

Follow for update