Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Jailbreaking on 100-query jailbreak set
Loading...
46.4
Jailbreak Success Rate
Reward-Guided RRT
25.392
30.846
36.3
41.754
Jul 10, 2025
Jailbreak Success Rate
Blocks (Realized Jailbreaks)
Blocks (Simulated Threats)
Successful Jailbreaks Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Jailbreak Success Rate
Blocks (Realized Jailbreaks)
Blocks (Simulated Threats)
Successful Jailbreaks Rate
Reward-Guided RRT
Model=DeepSeek-V3
2025.07
46.4
2.7
4.2
17.7
Reward-Guided RRT
Model=Gemini-2.5-Flash
2025.07
36
8.4
2.8
23.4
Baseline RRT
Model=DeepSeek-V3
2025.07
34.8
6.8
3.8
7.2
Reward-Guided RRT
Model=Llama-3.1-70B
2025.07
33.8
13.4
3.4
27.2
Reward-Guided RRT
Model=Qwen-Plus
2025.07
31
7.6
2.4
18
Baseline RRT
Model=Qwen-Plus
2025.07
29.4
5.6
9.2
7.4
Baseline RRT
Model=Llama-3.1-70B
2025.07
27.2
14.4
7.4
19.4
Baseline RRT
Model=Gemini-2.5-Flash
2025.07
26.2
6.4
5
14.2
Feedback
Search any
task
Search any
task