Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Jailbreaking on WildJailbreak (WJB)
Loading...
89.5
ASR@1 (Qwen2.5-7B-IT)
TRACE (single)
35.42
49.46
63.5
77.54
May 9, 2026
ASR@1 (Qwen2.5-7B-IT)
ASR@1 (Llama3.1-8B-IT)
ASR@1 (gpt-oss-20b)
Updated 22d ago
Evaluation Results
Method
Method
Links
ASR@1 (Qwen2.5-7B-IT)
ASR@1 (Llama3.1-8B-IT)
ASR@1 (gpt-oss-20b)
TRACE (single)
Workflow Category=Mult...
2026.05
89.5
73
79
TRACE (mix)
Workflow Category=Mult...
2026.05
85
68.17
78
Siren
Workflow Category=Mult...
2026.05
82
56
61
TROJail
Workflow Category=Mult...
2026.05
79.44
41.33
68
AutoDan-Turbo
Workflow Category=Sing...
2026.05
69
32
3
Jailbreak-R1
Workflow Category=Sing...
2026.05
62
41.5
10.5
Crescendo
Workflow Category=Mult...
2026.05
58
10.5
3.5
ActorAttack
Workflow Category=Mult...
2026.05
54.5
36.5
47.5
X-Teaming
Workflow Category=Mult...
2026.05
43.5
30.5
27
MUSE-A
Workflow Category=Mult...
2026.05
42.5
19.5
5.5
PAIR
Workflow Category=Sing...
2026.05
37.5
31
4
Feedback
Search any
task
Search any
task