Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Toxicity Mitigation benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Toxicity Mitigation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
Toxicity Mitigation Dataset 1000 trials (test)
S-PID
CLS Toxicity (%)
0.04
58
1mo ago
RealToxicityPrompts challenging
RAD
Avg Toxicity (Max)
6.2
46
3mo ago
ToxTET
None (original model)
ToxTET Rate
29.39
33
3mo ago
Toxicity prompts
A-LQR
CLS Toxicity (%)
0.12
32
1mo ago
Toxicity Mitigation Task
INJECTION
Generation Speed (s/item)
1.03
30
3mo ago
ToxicTop
DATG-P
Relevance
0.458
30
3mo ago
ToxicRandom
DATG-P
Relevance
45.1
30
3mo ago
REALTOXICITYPROMPTS
EIGENSHIFT + Targeted Subspace Intervention
Toxicity
21.24
24
3mo ago
RealToxicityPrompts 1k samples
PID-AcT
CLS Toxicity
0.51
20
2mo ago
RealToxicityPrompts (test)
DGLM
Full Toxicity
10.1
14
3mo ago
Real Toxicity Prompts Challenging Subset
M+
Avg Max Toxicity
0.189
12
1mo ago
RealToxicityPrompts (RTP)
CHaRS
CLS Tox Rate
0.53
12
3mo ago
ATTAQ
M+
Average Max Toxicity
0.122
9
1mo ago
ToxicTop (test)
CONTINUATION
Perplexity
33.21
6
3mo ago
ToxicRandom (test)
CONTINUATION
Perplexity
27.21
6
3mo ago
DailyDialog
Attack w/o defense
RTR (Backdoor Non-toxic)
1.16
3
2mo ago
Specialized category Manually-designed jailbreak attacks
Optimus (CH)
RTR
4.5
3
2mo ago
Toxic (test)
G
Toxicity Score
56
2
3mo ago
Specialized category Optimization-based jailbreak attacks
-
-
0
2mo ago
Showing 19 of 19 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs