Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HARMFULQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak AttackHarmfulQA
JADES56
33
Harmlessness evaluationHarmfulQA
Helpfulness Score69.4
33
Safety EvaluationHARMFULQA various domains
Safety Score (Chinese)19.17
8
Refusal EvaluationHarmfulQA
Refusal Rate85.31
7
Red-Teaming (Attack Success Rate)HARMFULQA
ASR0.702
7
Jailbreak Attack EvaluationHarmfulQA
ASR16
6
Language ModelingHarmfulQA
PPL83.41
1
Showing 7 of 7 rows