Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Safety Detection benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Safety Detection
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
AgentHazard Strongest
BraveGuard-Qwen3-Guard-4B
Accuracy
90.87
56
1d ago
Alpaca + XSTest full (val-selected)
wopt
AUROC
0.991
42
1mo ago
chat 1m (test)
MLP-Cls.
MCA Accuracy
100
21
1mo ago
Harmbench (test)
ReGA
MCA Accuracy
100
21
1mo ago
Polyguard Social Media
T3
AUROC
96.73
18
3mo ago
Polyguard HR
T3
AUROC
0.9982
18
3mo ago
Polyguard Education
T3
AUROC
99.43
18
3mo ago
Polyguard Cyber
T3
AUROC
0.9886
18
3mo ago
Polyguard Code
T3
AUROC
0.9959
18
3mo ago
ATBench-500
GPT-5.2
Accuracy
90
14
1d ago
SafeDialBench (full)
Nemotron
Recall
99
12
3mo ago
XSafety
T3+OCSVM
Safety Score (De)
0.9815
8
3mo ago
RTP LX
DUOGUARD
Safety Score (De)
98.76
8
3mo ago
Alpaca and AdvBench Conversation Level (test)
ReGA
MCA Accuracy
100
7
1mo ago
Alpaca and AdvBench Prompt Level (test)
ReGA
Accuracy (MCA)
100
7
1mo ago
DoNotAnswer (held-out)
Geometry-Lite
AUROC
97.4
5
13d ago
WildJailbreak (held-out)
Geometry-Lite
AUROC
99
5
13d ago
XSTest (held-out)
Geometry-Lite
AUROC
99.6
5
13d ago
JBB-Behaviors (held-out)
MultiLayer-DIM
AUROC
93.7
5
13d ago
PKU-SafeRLHF (held-out)
MultiLayer-DIM
AUROC
90.6
5
13d ago
ToxicChat (held-out)
MultiLayer-DIM
AUROC
87.7
5
13d ago
BeaverTails (held-out)
MultiLayer-Linear
AUROC
76.7
5
13d ago
Full (test)
Geometry-Lite
TPR @ 3% FPR
64.3
5
13d ago
TS-Bench AgentDojo-Traj (eval)
TS-Guard
Efficiency (s/sample)
1.36
4
3mo ago
TS-Bench AgentHarm-Traj (eval)
TS-Guard
Latency (s/sample)
1.36
4
3mo ago
Showing 25 of 31 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs