Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-turn Jailbreak Detection on HarmBench and DEFCON Multi-turn Jailbreak N=1,010 (test)
Loading...
84
F1 Score
DeepContext
16.4
33.95
51.5
69.05
Feb 18, 2026
F1 Score
Recall
Precision
MTTD
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
Recall
Precision
MTTD
DeepContext
2026.02
84
83
86
4.24
Llama-Prompt-Guard-2
Parameters=86M
2026.02
67
60
76
5.83
Granite-Guardian-3.3
Parameters=8B
2026.02
67
57
83
5.03
Gpt5-Nano
2026.02
65
55
81
5.73
Deberta-v3-Prompt-Injection
Backbone=Deberta-v3
2026.02
62
61
62
5.27
GCP Model Armor
2026.02
58
54
63
4.56
Qwen3Guard-Gen
Parameters=8B
2026.02
51
36
84
3.12
Llama-Guard-4
Parameters=12B
2026.02
51
42
65
6.09
AWS Prompt Attack Guardrails
2026.02
38
40
36
5.61
Azure Prompt Shield
2026.02
19
11
62
8
Feedback
Search any
task
Search any
task