Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Safety Alignment benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Safety Alignment
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
HarmBench
No Steering
ASR
0
88
3mo ago
Salad Bench
ShaPO-T
MD
0.68
68
2mo ago
HH-RLHF
ShaPO-T
MD Rate
1.09
68
2mo ago
Do-Not-Answer
ShaPO-R
MD
0
52
2mo ago
WildJailbreak
Full FT
Trainable parameters (M)
15,768.31
44
3mo ago
Visual Adversarial Attacks
Vanilla
ASR
43.1
40
2mo ago
JOOD
MoRAS
ASR
0
40
2mo ago
SORRY-Bench
LED-Merging
ASR
10.22
40
22d ago
PKU-SafeRLHF 30K (IID)
ShaPO-T
WR
89.26
36
3mo ago
AdvBench
SEA
Reward
-0.38
32
2mo ago
Harmful Dataset (test)
Non-Aligned
Harmful Score
81
30
3mo ago
AdvBench
BoN64
Harm Rate
0
25
7d ago
WildJailbreak
R1 - 8B + UnsafeChain full
Safe@1
77.2
24
1mo ago
Safety Benchmarks (Sorry-bench, StrongREJECT, WildJailbreak, JBB-PAIR, JBB-GCG)
SafeChain
Average Score
42.34
21
1mo ago
XSTest
Yi-VL-6B
Compliance
95.2
21
22d ago
HEx-PHI
DiaBlo
HEx-PHI Score
98.8
18
14d ago
HarmBench
SFT
MD Score
95
18
3mo ago
Average (Do-Not-Answer, HarmBench, HH-RLHF, Salad Bench)
ShaPO-T
Aggregate Score
0.59
18
3mo ago
StrongReject
R1 - 7B + UnsafeChain full
Safe@1
58
18
1mo ago
PKU-SafeRLHF
PPO
Gold Reward
3.92
14
3mo ago
BeaverTails V
SaFeR-ToolKit (+ SFT+GRPO) [3B]
Safety Score
93.37
13
3mo ago
Safety Alignment
PALM
Multiplicative Gap (Epsilon)
0.0131
12
1mo ago
HarmBench
ImplicitRM
Score
98.2
10
2mo ago
Safety Alignment Dataset 4-order (test)
MOSAIC-2
DSR
100
10
2mo ago
Safety Alignment Dataset 3-order (test)
MOSAIC-5
DSR
100
10
2mo ago
Showing 25 of 52 rows
25 / page
50 / page
100 / page
1
2
3
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs