Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Safety Classification benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Safety Classification
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
AUTALIC (test)
GPT-OSS
F1 Score
37.7
64
7d ago
HoliSafe-Bench
Qwen3-VL 8B
AUROC
0.783
49
1mo ago
UnsafeBench
Gemma
AUROC
80.5
49
1mo ago
SafeRLHF
Qwen3Guard
F1 Score
0.94
48
1mo ago
WildGuardMix (test)
OSS-Safeguard-20B-High
F1 Score
95.3
47
2d ago
ToxicChat (test)
D2-TimeAttn
Accuracy
97.3
43
7d ago
BeaverTails (test)
Separate Guardrail 7B
AUC
94
24
1mo ago
AEGIS 2.0 (test)
DSA:LST
AUC
94
24
1mo ago
OpenAI-moderation (test)
TPC
Accuracy
74.88
23
7d ago
HoliSafe-Bench
Llama
ECE
8.4
21
1mo ago
UnsafeBench
Llama
ECE
0.061
21
1mo ago
Pre-Ex-Bench
TRACE
Accuracy
94.01
20
1d ago
ASSEBench
TRACE
Accuracy
92.04
20
1d ago
XSTest (test)
LEG large
F1
92.91
20
3mo ago
WildGuard (test)
TPC
F1 Score
88.5
17
22h ago
XSTest
AprielGuard
F1 Score
94
16
3mo ago
MultiJail
CREST-BASE
F1 Score
0.9335
15
1mo ago
Wildguardmix
Apriel Guard
F1 Score
76
15
3mo ago
HarmBench
IBM Granite Guardian 3.2
Recall
100
14
3mo ago
AegisSafetyTest V2
Qwen3Guard
F1 Score
87
14
5d ago
AegisSafety V1 (test)
Qwen3Guard
F1 Score
92
14
3mo ago
ToxicChat
Qwen3Guard
F1 Score
0.81
14
3mo ago
XSTestResponse
AprielGuard
F1 Score
0.96
14
3mo ago
Aya Redteaming
IBM Granite Guardian 3.1
Recall
94
14
3mo ago
SimpleSafetyTests
IBM Granite Guardian 3.2
Recall
100
14
3mo ago
Showing 25 of 78 rows
25 / page
50 / page
100 / page
1
2
3
4
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs