Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Jailbreak Safety Evaluation on StrongREJECT (test)
Loading...
100
Overall Score
STAIR-DPO
98.6688
99.0144
99.36
99.7056
Sep 15, 2025
Overall Score
PAIR-GPT Score
PAP Score
DeepInception Score
Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Overall Score
PAIR-GPT Score
PAP Score
DeepInception Score
Average Score
STAIR-DPO
Base Model=Llama3.1-8B...
2025.09
100
68.37
94.25
98.72
90.34
Base
Base Model=Llama3.1-8B...
2025.09
99.68
35.14
64.86
52.4
63.02
WJ-SFT
Base Model=Llama3.1-8B...
2025.09
99.36
44.73
76.04
98.4
79.63
ReSA-SFT
Base Model=Llama3.1-8B...
2025.09
99.36
89.78
96.81
99.36
96.33
ReSA-SFT
Base Model=Llama3.1-8B...
2025.09
98.72
78.27
95.85
97.76
92.65
Feedback
Search any
task
Search any
task