Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Alignment on HarmBench (Score, Delta)
Loading...
98.2
Score
ImplicitRM
83.2344
87.1197
91.005
94.8903
Mar 24, 2026
Score
Score Delta (%)
Updated 24d ago
Evaluation Results
Method
Method
Links
Score
Score Delta (%)
ImplicitRM
Policy model=Qwen3-Ins...
2026.03
98.2
8.1
SelectMix
Policy model=Qwen3-Ins...
2026.03
95.78
5.4
LAGAM
Policy model=Qwen3-Ins...
2026.03
95.12
4.7
SDR2
Policy model=Qwen3-Ins...
2026.03
94.24
3.7
ImplicitRM
Policy model=Qwen2.5-I...
2026.03
92.58
10.5
Naive
Policy model=Qwen3-Ins...
2026.03
90.84
-
LAGAM
Policy model=Qwen2.5-I...
2026.03
90.6
8.1
SelectMix
Policy model=Qwen2.5-I...
2026.03
90.33
7.8
SDR2
Policy model=Qwen2.5-I...
2026.03
88.47
5.6
Naive
Policy model=Qwen2.5-I...
2026.03
83.81
-
Feedback
Search any
task
Search any
task