Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Safety Moderation benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Safety Moderation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
Harmbench
Qwen3Guard-8B-Gen-strict
F1 Score
87.2
26
11d ago
WILDJAILBREAK (val)
WILDGUARD
ASR
0.7
18
1mo ago
Wild Guard Response
WildGuard
F1 Score
88.9
12
1mo ago
GuarEval Prompt
WildGuard
F1 Score
88.9
10
1mo ago
Public Safety Benchmark Suite Average AR
PolyGuard-Ministral
F1 Score
83
8
1mo ago
Public Safety Benchmark Suite Average (EN)
Wildguard
F1 Score
86
8
1mo ago
XSTest AR
FanarGuard
F1 Score
88
8
1mo ago
XSTest EN
Wildguard
F1 Score
95
8
1mo ago
Wild Guard AR
PolyGuard-Ministral
F1 Score
78
8
1mo ago
Wild Guard EN
PolyGuard-Ministral
F1 Score
78
8
1mo ago
Safe RLHF AR
FanarGuard
F1 Score
92
8
1mo ago
Safe RLHF EN
MD-Judge
F1 Score
93
8
1mo ago
Harm Bench AR
PolyGuard-Ministral
F1 Score
85
8
1mo ago
RobloxGuard Eval
Roblox Guard 1.0
F1 Score
79.6
7
1mo ago
SafeRLHF
Roblox Guard 1.0
F1 Score
69.9
7
1mo ago
Aegis Response 2.0
NemoGuard
F1 Score
87.6
7
1mo ago
XSTest
BingoGuard
F1 Score
94.9
7
1mo ago
WildGuard Prompt
Roblox Guard 1.0
F1 Score
89.5
7
1mo ago
SimpleSafetyTest
Roblox Guard 1.0
F1 Score
100
7
1mo ago
OAI Mod
ShieldGemma
F1 Score
82.1
7
1mo ago
Aegis Prompt 2.0
Roblox Guard 1.0
F1 Score
87.9
7
1mo ago
Aegis Prompt 1.0
Roblox Guard 1.0
F1 Score
91.9
7
1mo ago
Beaver Response
WildGuard
F1 Score
84.4
5
1mo ago
Nemo-Safety Response
WildGuard
F1 Score
0.835
5
1mo ago
GuarEval Response
GGuard
F1 Score (Safety Moderation)
79.4
5
1mo ago
Showing 25 of 27 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs