Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Toxic Comment Detection on Toxic Comment
Loading...
47.3
ASR
Criteria Attack
-0.436
11.957
24.35
36.743
Jan 15, 2026
ASR
Updated 4d ago
Evaluation Results
Method
Method
Links
ASR
Criteria Attack
Defense=StruQ
2026.01
47.3
Combined
Defense=StruQ
2026.01
18.1
Topic
Defense=SecAlign
2026.01
15.4
Criteria Attack
Defense=SecAlign
2026.01
15.2
Ignore
Defense=StruQ
2026.01
14.2
Topic
Defense=StruQ
2026.01
12.3
Escape Separation
Defense=StruQ
2026.01
12.1
Fake Completion
Defense=StruQ
2026.01
11.9
Ignore
Defense=SecAlign
2026.01
9.1
Separator Injection
Defense=StruQ
2026.01
8.8
Escape Separation
Defense=SecAlign
2026.01
4.7
Combined
Defense=SecAlign
2026.01
2.1
Fake Completion
Defense=SecAlign
2026.01
1.4
Separator Injection
Defense=SecAlign
2026.01
1.4
Feedback
Search any
task
Search any
task