Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Red Teaming benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Red Teaming
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
50 harmful goals (Manual evaluation)
PAIR
Hard ASR
100
30
1mo ago
CatQA
SafeTransformer
ASR
0
20
1mo ago
AdversarialQA
SafeTransformer
ASR
0
20
1mo ago
Religious Discrimination principle v1 (test)
QCI
Mean Best Category Score
5.32
12
1mo ago
Illegal Activity principle v1 (test)
RS
Mean Score (Best Category)
-2.73
12
1mo ago
AI Supremacy principle v1 (test)
CRL
Mean Best Category Score
11.7
12
1mo ago
AdvBench (test)
AMIS
ASR
88
8
15d ago
DailyDialog against DialoGPT-large
BRT (e+r)
RSR
40
8
1mo ago
DailyDialog against BB-3B
BRT (e+r)
RSR
40.2
8
1mo ago
ConvAI2 (filtered hard positive)
BRT (e+r)
RSR
2,120
7
1mo ago
Bloom ZS (filtered hard positive)
BRT (e+r)
RSR
15.6
7
1mo ago
BAD Against Friend Chat (test)
BRT (e)
RSR
64.2
7
1mo ago
BAD Against Marv (test)
BRT (s+r)
RSR
88.1
7
1mo ago
Korean red teaming dataset (test)
Exaone-3.5-2.4B-inst
Attack Success Rate
0.5797
5
1mo ago
HarmBench Claude-Sonnet-3.5 (held-out test)
AGENTICRED
ASR
60
5
1mo ago
HarmBench Llama-3-8B (test)
AGENTICRED
ASR
0.98
5
1mo ago
HarmBench Llama-2-7B (test)
AutoDAN-Turbo
ASR
36
5
1mo ago
KT RAIC proprietary Korean red-teaming dataset
EXAONE-4.0-32B
Attack Success Rate
54
4
29d ago
Wan Seed-free generation 2.2
ART
Violence Rate
40
3
1mo ago
Hunyuan-Video Seed-free generation
TEAR
Violence Rate
60
3
1mo ago
HarmBench (test)
DIALTREE
ASR@1
85.1
3
1mo ago
HarmBench Claude-4-Sonnet (test)
DIALTREE
ASR@1
71
3
1mo ago
HarmBench gpt-4o-2024-08-06 (test)
AdvReasoning
ASR
86
3
1mo ago
HarmBench gpt-3.5-turbo-0125 (test)
TransferAttack
ASR
80
3
1mo ago
ConvAI2 (test)
BRT (e)
P Score
186
3
1mo ago
Showing 25 of 25 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs