Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Backdoor Mitigation on SFT-based Poisoning Phrase trigger

95.7Clean Accuracy (CACC)

SFT

92.028892.981993.93594.8881Oct 11, 2025
Updated 20d ago

Evaluation Results

MethodLinks
2025.10
95.7100
2025.10
95.6100
2025.10
95.40
2025.10
951.25
2025.10
9584.73
2025.10
94.939.2
2025.10
94.993.2
2025.10
94.959
2025.10
94.878
2025.10
94.60
2025.10
94.2249.2
2025.10
9464
2025.10
93.94
2025.10
93.4100
2025.10
93.312.75
2025.10
93.122.8
2025.10
92.516
2025.10
92.1796.6