Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Harmful Content Detection benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Harmful Content Detection
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
HoliSafe-Bench
Qwen3-VL 8B
AUPRC
75.6
49
1mo ago
UnsafeBench
Gemma
AUPRC
71.7
49
1mo ago
PHEME New Attacks: ExplainDrive (test)
LLM-SGA/ARHOCD
Accuracy
82.91
15
3mo ago
Fortress
SInternal
ASR
18.6
12
22d ago
PHEME Known Attacks: DeepWordBug, TFAdjusted, TREPAT (test)
LLM-SGA/ARHOCD
Accuracy
85.59
10
3mo ago
ELF-HP
Perspective API
Accuracy
48.57
8
1mo ago
ELF22
Perspective API
Accuracy
43.96
8
1mo ago
CADD
Perspective API
Accuracy
90.19
8
1mo ago
COVID-HATE
Perspective API
Accuracy
96.4
8
1mo ago
MT-CONAN
LlamaGuard-1
Accuracy
97.24
8
1mo ago
CONAN
LlamaGuard-1
Accuracy
98.47
8
1mo ago
Qian-Reddit
OpenAI Moderation
Accuracy
97.09
8
1mo ago
Qian-Gab
OpenAI Moderation
Accuracy
99.06
8
1mo ago
FoodGuardBench (test)
FoodGuard-4B
FNR
2.75
7
2mo ago
BeaverTails Harmful (held-out target labels)
Quotient Transfer
AUROC
0.793
6
21d ago
Trolling-oriented generations DeepSeek-Llama 70B
Perspective API
Accuracy
16.24
4
1mo ago
Trolling-oriented generations Llama-3.1 70B
OpenAI Moderation
Accuracy
26.04
4
1mo ago
Trolling-oriented generations GPT-4o
Perspective API
Accuracy
19.88
4
1mo ago
CADD DeepSeek generations
Perspective API
Accuracy
60.13
4
1mo ago
CADD Llama-3.1 generations
Perspective API
Accuracy
69.18
4
1mo ago
CADD GPT-4o generations
Perspective API
Accuracy
64.84
4
1mo ago
Ours trolling-oriented synthetic
Perspective API
Accuracy
19.88
4
1mo ago
Ours CADD-based synthetic
Perspective API
Accuracy
65.55
4
1mo ago
Standard Harmful Content Datasets Evasion Attack
GAVEL
Phishing
96
3
3mo ago
Standard Harmful Content Datasets (Goal Hijacking Attack)
GAVEL
Phishing
96
2
3mo ago
Showing 25 of 26 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs