Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Do-Not-Answer

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety AlignmentDo-Not-Answer
MD0
52
Safety EvaluationDo-Not-Answer
MD Rate62.26
16
Refusal DetectionDo-Not-Answer Portuguese (test)
Accuracy100
9
Question AnsweringDo-Not-Answer Portuguese Verbose Questions translated and adapted (61 question-answer pairs)
Mean Accuracy4.01
9
Question AnsweringDo-Not-Answer Portuguese Direct Questions translated and adapted (61 question-answer pairs)
Mean Accuracy Score4.14
9
Safety EvaluationDo-Not-Answer (test)
ASR3.195
9
Refusal EvaluationDo-Not-Answer
Refusal Rate95.21
7
Jailbreak Attack EvaluationDo-Not-Answer
ASR2.5
6
Safety EvaluationDo-not-Answer
Safety Score69.9
4
Language ModelingDo-Not-Answer
PPL154.81
1
Showing 10 of 10 rows