Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Safety Alignment benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Safety Alignment
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
HarmBench
No Steering
ASR
0
88
1mo ago
Salad Bench
ShaPO-T
MD
0.68
68
1mo ago
HH-RLHF
ShaPO-T
MD Rate
1.09
68
1mo ago
Do-Not-Answer
ShaPO-R
MD
0
52
1mo ago
WildJailbreak
Full FT
Trainable parameters (M)
15,768.31
44
1mo ago
Visual Adversarial Attacks
Vanilla
ASR
43.1
40
19d ago
JOOD
MoRAS
ASR
0
40
19d ago
SORRY-Bench
LED-Merging
ASR
10.22
40
1mo ago
PKU-SafeRLHF 30K (IID)
ShaPO-T
WR
89.26
36
1mo ago
AdvBench
SEA
Reward
-0.38
32
1mo ago
Harmful Dataset (test)
Non-Aligned
Harmful Score
81
30
1mo ago
WildJailbreak
R1 - 8B + UnsafeChain full
Safe@1
77.2
24
11d ago
HarmBench
SFT
MD Score
95
18
1mo ago
Average (Do-Not-Answer, HarmBench, HH-RLHF, Salad Bench)
ShaPO-T
Aggregate Score
0.59
18
1mo ago
StrongReject
R1 - 7B + UnsafeChain full
Safe@1
58
18
11d ago
XSTest
Yi-VL-6B
Compliance
95.2
15
9d ago
PKU-SafeRLHF
PPO
Gold Reward
3.92
14
1mo ago
BeaverTails V
SaFeR-ToolKit (+ SFT+GRPO) [3B]
Safety Score
93.37
13
1mo ago
Safety Alignment
PALM
Multiplicative Gap (Epsilon)
0.0131
12
11d ago
HEx-PHI
DiaBlo
HEx-PHI Score
98.8
12
1mo ago
HarmBench
ImplicitRM
Score
98.2
10
24d ago
Safety Alignment Dataset 4-order (test)
MOSAIC-2
DSR
100
10
1mo ago
Safety Alignment Dataset 3-order (test)
MOSAIC-5
DSR
100
10
1mo ago
Safety Alignment Dataset 2-order (test)
MOSAIC-5
DSR
99.8
10
1mo ago
Safety Alignment Dataset 1-order (test)
MOSAIC-2
DSR
100
10
1mo ago
Showing 25 of 40 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs