Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Jailbreak Detection on HB
Loading...
100
Correctness Rate (COR)
gpt-4o-mini
16.8
38.4
60
81.6
Feb 14, 2026
Correctness Rate (COR)
Updated 4d ago
Evaluation Results
Method
Method
Links
Correctness Rate (COR)
gpt-4o-mini
Version=2024-07-18
2026.02
100
gpt-5-mini
Version=2025-08-07
2026.02
100
AISA
Backbone=Qwen3-8b-I
2026.02
99.5
AISA
Backbone=Mistral-7b-I
2026.02
99
AISA
Backbone=Llama2-13b-I
2026.02
99
AISA
Backbone=Llama3.1-8b-I
2026.02
98.5
gpt-4.1-mini
Version=2025-04-14
2026.02
98
Jailbreak-Classifier
2026.02
98
AISA
Backbone=GPT-OSS-20b-I
2026.02
97.5
GradSafe
2026.02
95.5
NemoGuard-JailbreakDetect
2026.02
87.5
SPDetector
2026.02
68
Llama-Prompt-Guard-2
Parameters=86M
2026.02
20
Feedback
Search any
task
Search any
task