Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Jailbreak Robustness on Mousetrap
Loading...
0
Harmfulness Rate
SAFEPATH-ZS
-0.48
2.76
6
9.24
Aug 6, 2025
Harmfulness Rate
Updated 27d ago
Evaluation Results
Method
Method
Links
Harmfulness Rate
SAFEPATH-ZS
Model=R1-Qwen-14B
2025.08
0
ReasoningGuard
Model=R1-Qwen-14B
2025.08
0
SmoothLLM
Model=R1-Qwen-32B
2025.08
0
ThinkingI
Model=R1-Qwen-32B
2025.08
0
ReasoningGuard
Model=R1-Qwen-32B
2025.08
0
Self-Reminder
Model=R1-Qwen-14B
2025.08
2
SAFEPATH-ZS
Model=R1-Qwen-32B
2025.08
2
RealSafe-R1
Model=R1-Qwen-14B
2025.08
4
SmoothLLM
Model=R1-Qwen-14B
2025.08
4
ThinkingI
Model=R1-Qwen-14B
2025.08
4
RealSafe-R1
Model=R1-Qwen-32B
2025.08
4
Self-Reminder
Model=R1-Qwen-32B
2025.08
4
No Defense
Model=R1-Qwen-14B
2025.08
6
SafeKey
Model=R1-Qwen-14B
2025.08
8
No Defense
Model=R1-Qwen-32B
2025.08
8
Paraphrase
Model=R1-Qwen-14B
2025.08
12
Paraphrase
Model=R1-Qwen-32B
2025.08
12
Feedback
Search any
task
Search any
task