Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Jailbreak Evaluation on Multilingual Jailbreak Dataset (Evaluation set)
Loading...
2.3
JSR
claude-3-haiku
1.268
8.234
15.2
22.166
May 16, 2026
JSR
Updated 15d ago
Evaluation Results
Method
Method
Links
JSR
claude-3-haiku
Family=Claude, Safety...
2026.05
2.3
DeepSeek
Family=DeepSeek, Safet...
2026.05
2.9
DeepSeek
Family=DeepSeek, Safet...
2026.05
3
gemini-2.0-flash
Family=Gemini, Safety...
2026.05
3.5
gpt-4.1-mini
Family=GPT, Safety Con...
2026.05
4.2
claude-haiku-4.5
Family=Claude, Safety...
2026.05
4.4
grok-4-fast-reasoning
Family=Grok, Safety Co...
2026.05
5.3
gpt-4o-mini
Family=GPT, Safety Con...
2026.05
6.6
gemini-3-flash-preview
Family=Gemini, Safety...
2026.05
11.8
grok-4-1-fast-non-reasoning
Family=Grok, Safety Co...
2026.05
28.1
Feedback
Search any
task
Search any
task