Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Jailbreak Attack Robustness on LogiBreak
Loading...
0
LogiBreak ASR
SFT
-1.2
6.9
15
23.1
Sep 30, 2025
LogiBreak ASR
OR-Bench Toxic Score
OR-Bench Hard Score
MMLU Accuracy
R-Score
Overall Score
Updated 4d ago
Evaluation Results
Method
Method
Links
LogiBreak ASR
OR-Bench Toxic Score
OR-Bench Hard Score
MMLU Accuracy
R-Score
Overall Score
SFT
ratio=30/70, base_mode...
2025.09
0
59.3
1.59
66.7
0
15
RepBend
base_model=Llama-3.1-8...
2025.09
13
68.7
78.9
68.2
0
8.5
ASGUARD
base_model=Llama-3.1-8...
2025.09
13
97.1
64.9
68.1
74.7
45.8
Llama-3.1-8B-Instruct
2025.09
30
88.5
28.9
68.2
-
-
Feedback
Search any
task
Search any
task