Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Attack Detection on Harmful Attacks 105K sample set
Loading...
97.4
Detection Rate
LlamaGuard
3.488
27.869
52.25
76.631
Feb 15, 2026
Detection Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Detection Rate
LlamaGuard
acronym=LG
2026.02
97.4
Llama-as-Judge
acronym=LJ, prompting=...
2026.02
85.8
LogReg (Ours)
input=raw activations,...
2026.02
68
LogReg (Ours)
input=raw activations,...
2026.02
67
PromptGuard 2
acronym=PG
2026.02
37.3
PromptGuard 2
acronym=PG
2026.02
36.7
LlamaGuard
acronym=LG
2026.02
27.4
Llama-as-Judge
acronym=LJ, prompting=...
2026.02
7.1
Feedback
Search any
task
Search any
task