Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Safety Classification benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Safety Classification
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
SafeRLHF
Qwen3Guard
F1 Score
0.94
48
1mo ago
WildGuardMix (test)
LEG large
F1 (Unsafe)
75.83
27
1mo ago
XSTest (test)
LEG large
F1
92.91
20
1mo ago
XSTest
AprielGuard
F1 Score
94
16
1mo ago
Wildguardmix
Apriel Guard
F1 Score
76
15
1mo ago
HarmBench
IBM Granite Guardian 3.2
Recall
100
14
1mo ago
AegisSafetyTest V2
Qwen3Guard
F1 Score
87
14
1mo ago
AegisSafety V1 (test)
Qwen3Guard
F1 Score
92
14
1mo ago
ToxicChat
Qwen3Guard
F1 Score
0.81
14
1mo ago
XSTestResponse
AprielGuard
F1 Score
0.96
14
1mo ago
Aya Redteaming
IBM Granite Guardian 3.1
Recall
94
14
1mo ago
SimpleSafetyTests
IBM Granite Guardian 3.2
Recall
100
14
1mo ago
SafeEditBench
Llama Guard
Policy L1 Success Rate
100
11
1mo ago
ToxicChat (out-of-distribution)
Multi-head self-attn
F1 Score
72.88
11
1mo ago
AdvBench
Gradient-Controlled Decoding (GCD)
F1 Score
99.9
10
10d ago
HarmBench (test)
Oracle
F1 Score
90.5
9
1mo ago
OAI (test)
Oracle
F1 Score
86.5
9
1mo ago
WildGuardMix-p (test)
Oracle
F1 Score
93.2
9
1mo ago
DiaSafety (test)
GAUGE-mean
AUROC
66.98
8
1mo ago
OS Bench
CLUE
Recall
0.936
8
1mo ago
GuardSet (test)
GPT-4o-mini
Accuracy (Harmless)
96.26
7
9d ago
Bloom safety filter benchmark 100 examples (corrected) (test)
Bloom Safety Filter
Accuracy
100
7
1mo ago
Bloom safety filter benchmark 100 examples (test)
Bloom Safety Filter
Accuracy
100
7
1mo ago
Bloom safety filter benchmark 400 examples (val)
Bloom Safety Filter
Accuracy
100
7
1mo ago
3,000 Polish user prompts (test)
Bielik Guard 0.1B v1.1
Precision
77.65
7
1mo ago
Showing 25 of 46 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs