Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HARMFULQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak AttackHarmfulQA
JADES56
33
Harmlessness evaluationHarmfulQA
Helpfulness Score69.4
33
LLM Safety and Informativeness EvaluationHarmfulQA
Safety Rate98.1
15
Safety EvaluationHARMFULQA various domains
Safety Score (Chinese)19.17
8
Refusal EvaluationHarmfulQA
Refusal Rate85.31
7
Red-Teaming (Attack Success Rate)HARMFULQA
ASR0.702
7
Jailbreak Attack EvaluationHarmfulQA
ASR16
6
Safety and Informativeness EvaluationHarmfulQA Social science
Safety Score90
4
Safety and Informativeness EvaluationHarmfulQA Science and Technology
Safety Score95
4
Safety and Informativeness EvaluationHarmfulQA Philosophy and Ethics
Safety Score80
4
Safety and Informativeness EvaluationHarmfulQA Mathematics and Logic
Safety Score76.7
4
Safety and Informativeness EvaluationHarmfulQA Literature and Language
Safety Score100
4
Safety and Informativeness EvaluationHarmfulQA History and Culture
Safety Score90
4
Safety and Informativeness EvaluationHarmfulQA Health and Medicine
Safety Score85
4
Safety and Informativeness EvaluationHarmfulQA Geography and Environment
Safety Rate95
4
Safety and Informativeness EvaluationHarmfulQA Education and Pedagogy
Safety Score100
4
Safety and Informativeness EvaluationHarmfulQA Business and Economic
Safety Rate91
4
Language ModelingHarmfulQA
PPL83.41
1
Showing 18 of 18 rows