Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Safety Alignment Aggregate (Do-Not-Answer, HarmBench, HH-RLHF, Salad Bench)
Loading...
0.59
Aggregate Score
ShaPO-T
-1.7572
14.0864
29.93
45.7736
Feb 7, 2026
Aggregate Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Aggregate Score
ShaPO-T
Backbone=LLaMA-3.2-3B
2026.02
0.59
ShaPO-R
Backbone=LLaMA-3.2-3B
2026.02
0.99
Dr.DPO
Backbone=LLaMA-3.2-3B
2026.02
1.2
rDPO
Backbone=LLaMA-3.2-3B
2026.02
3.27
ShaPO-R
Backbone=Pythia-2.8B
2026.02
6.91
ShaPO-T
Backbone=Pythia-2.8B
2026.02
7.71
IPO
Backbone=LLaMA-3.2-3B
2026.02
8.45
DPO
Backbone=LLaMA-3.2-3B
2026.02
8.58
cDPO
Backbone=LLaMA-3.2-3B
2026.02
19.12
Dr.DPO
Backbone=Pythia-2.8B
2026.02
23.9
rDPO
Backbone=Pythia-2.8B
2026.02
39.67
DPO
Backbone=Pythia-2.8B
2026.02
41.21
cDPO
Backbone=Pythia-2.8B
2026.02
41.9
IPO
Backbone=Pythia-2.8B
2026.02
42.58
Vallina
Backbone=LLaMA-3.2-3B
2026.02
46.46
Vallina
Backbone=Pythia-2.8B
2026.02
52.3
SFT
Backbone=Pythia-2.8B
2026.02
52.68
SFT
Backbone=LLaMA-3.2-3B
2026.02
59.27
Feedback
Search any
task
Search any
task