Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Backdoor Mitigation on SFT-based Poisoning Word trigger

96.7Clean Accuracy (CACC)

SFT

86.528889.169491.8194.4506Oct 11, 2025
Updated 20d ago

Evaluation Results

MethodLinks
2025.10
96.778.61
2025.10
96.27.5
2025.10
95.80.98
2025.10
95.71.6
2025.10
95.590.2
2025.10
95.188.44
2025.10
95.030.64
2025.10
94.932.8
2025.10
94.698
2025.10
94.617.22
2025.10
94.68.25
2025.10
94.686
2025.10
94.254
2025.10
94.28.25
2025.10
94.16.39
2025.10
9430
2025.10
93.898
2025.10
86.923.6