Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HARMFULQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak AttackHarmfulQA
JADES56
33
Harmlessness evaluationHarmfulQA
Helpfulness Score69.4
33
Safety EvaluationHARMFULQA various domains
Safety Score (Chinese)19.17
8
Red-Teaming (Attack Success Rate)HARMFULQA
ASR0.702
7
Jailbreak Attack EvaluationHarmfulQA
ASR16
6
Language ModelingHarmfulQA
PPL83.41
1
Showing 6 of 6 rows