Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

XSTest

Benchmarks

Task NameDataset NameSOTA ResultTrend
Refusal Rate EvaluationXSTest Safe (test)
Refusal Rate0
56
Over-refusalXSTest
XSTest Score86.89
42
Response Harmfulness DetectionXSTEST-RESP
Response Harmfulness F195.48
34
Safety EvaluationXSTest (test)
XSTest Score95
32
Adversarial and Jailbreaking Attack DetectionXSTest
AUROC0.8418
20
Safety ClassificationXSTest (test)
F192.91
20
Response ClassificationXSTest Text Response
F1 Score98.43
16
Safety ClassificationXSTest
F1 Score94
16
Prompt classificationXSTest
F1 Score94.8
16
Prompt ClassificationXSTest Text Prompt
F1 Score93.71
14
Safety EvaluationXSTest
FRR1.6
14
Safety ClassificationXSTestResponse
F1 Score0.96
14
Safety EvaluationXSTest
Safety Score97.9
13
Safety EvaluationXSTest (out-of-domain)
Accuracy88.67
12
Safety AlignmentXSTest
Compliance95.2
12
Safety & Helpfulness EvaluationXSTest
XSTest Score74.8
11
Prompt-Response Safety RoutingXSTest
Routing F156.21
10
Refusal DetectionXSTEST-RESP (full)
RR (F1)98.1
9
Safety EvaluationXSTEST
HS Rate2.05
8
Red-teaming Safety EvaluationXSTEST
HPR61
8
Safety ModerationXSTest
F1 Score94.9
7
Unsafe Prompt DetectionXSTest (test)
Precision87.8
7
Over-refusal EvaluationXSTest (test)
Over-refusal Rate0.035
4
General LLM EvaluationXSTest
Refusal Rate23.1
4
Unsafe prompt detectionXSTest
AUPRC93.6
4
Showing 25 of 27 rows