Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WildTeaming

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak RobustnessWildTeaming WJ
Evaluation Score (avg@4)95.1
18
Safety EvaluationWildTeaming
H89
16
Safety and Helpfulness EvaluationWildTeaming
Harm Rate0.3
15
Safety EvaluationWildTeaming 500-example (test)
HarmR88.6
14
LLM Red-TeamingWildTeaming Target: Mistral
ASR55.5
2
LLM Red-TeamingWildTeaming Llama-3
ASR25.4
2
LLM Red-teamingWildTeaming Llama-3.3-70B target
Effectiveness Count3,711
2
LLM Red-teamingWildTeaming Mistral-7B target
Effectiveness Query Count5,414
2
Showing 8 of 8 rows