Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

XSTest

Benchmarks

Task NameDataset NameSOTA ResultTrend
Over-refusalXSTest
Overrefusal Rate0
102
Safety EvaluationXSTest Unsafe
False Compliance Rate (FC)0
78
Safety EvaluationXSTest Safe
FC4
78
Response Harmfulness DetectionXSTEST-RESP
Response Harmfulness F195.48
76
Refusal Rate EvaluationXSTest Safe (test)
Refusal Rate0
56
Safety EvaluationXSTest
F1 Score97
44
Safety EvaluationXSTest (test)
XSTest Score95
36
Adversarial and Jailbreaking Attack DetectionXSTest
AUROC0.984
35
Safety EvaluationXSTest (combined)
F1 Score100
34
Safety EvaluationXSTest
Safety Score98.4
32
Over-refusal EvaluationXSTest
Evaluation Score (avg@4)100
26
Refusal EvaluationXSTest Unsafe
Refusal Rate100
25
Over-refusal EvaluationXSTest Safe
Over-refusal Rate1.6
25
Safety AlignmentXSTest
Compliance95.2
21
Harmful prompt detectionXSTest
F1 Score97.44
20
HarmlessnessXsTest
Refusal Rate99
20
Safety ClassificationXSTest (test)
F192.91
20
Safety EvaluationXSTest
FRR1.6
19
Safety EvaluationXsTest
Harmful Rate2
16
Response ClassificationXSTest Text Response
F1 Score98.43
16
Safety ClassificationXSTest
F1 Score94
16
Prompt classificationXSTest
F1 Score94.8
16
Safety EvaluationXSTest Toxic
Safety95
15
Refusal EvaluationXSTest Seemingly Toxic Subsets
XS98
15
Model Utility EvaluationXSTest
CR96.4
14
Showing 25 of 69 rows