Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

XSTest

Benchmarks

Task NameDataset NameSOTA ResultTrend
Over-refusalXSTest
Overrefusal Rate0
78
Refusal Rate EvaluationXSTest Safe (test)
Refusal Rate0
56
Response Harmfulness DetectionXSTEST-RESP
Response Harmfulness F195.48
34
Safety EvaluationXSTest (test)
XSTest Score95
32
Safety EvaluationXSTest
Safety Score98.4
23
HarmlessnessXsTest
Refusal Rate99
20
Adversarial and Jailbreaking Attack DetectionXSTest
AUROC0.8418
20
Safety ClassificationXSTest (test)
F192.91
20
Safety EvaluationXsTest
Harmful Rate2
16
Response ClassificationXSTest Text Response
F1 Score98.43
16
Safety ClassificationXSTest
F1 Score94
16
Prompt classificationXSTest
F1 Score94.8
16
Safety EvaluationXSTest Toxic
Safety95
15
Refusal EvaluationXSTest Seemingly Toxic Subsets
XS98
15
Safety AlignmentXSTest
Compliance95.2
15
Safety EvaluationXSTest
ASR0.9
14
Prompt ClassificationXSTest Text Prompt
F1 Score93.71
14
Safety EvaluationXSTest
FRR1.6
14
Safety ClassificationXSTestResponse
F1 Score0.96
14
Safety EvaluationXSTest (out-of-domain)
Accuracy88.67
12
Output ModerationXSTEST-RESP (XSTESTR)
F1 Score92.7
11
Safety & Helpfulness EvaluationXSTest
XSTest Score74.8
11
Prompt-Response Safety RoutingXSTest
Routing F156.21
10
Overblocking evaluationXSTEST-RESP (benign)
F1 Score89.9
9
Safety EvaluationXSTest
Safe Compliance98.1
9
Showing 25 of 46 rows