| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Over-refusal | XSTest | Overrefusal Rate0 | 78 | |
| Refusal Rate Evaluation | XSTest Safe (test) | Refusal Rate0 | 56 | |
| Response Harmfulness Detection | XSTEST-RESP | Response Harmfulness F195.48 | 34 | |
| Safety Evaluation | XSTest (test) | XSTest Score95 | 32 | |
| Safety Evaluation | XSTest | Safety Score98.4 | 23 | |
| Harmlessness | XsTest | Refusal Rate99 | 20 | |
| Adversarial and Jailbreaking Attack Detection | XSTest | AUROC0.8418 | 20 | |
| Safety Classification | XSTest (test) | F192.91 | 20 | |
| Safety Evaluation | XsTest | Harmful Rate2 | 16 | |
| Response Classification | XSTest Text Response | F1 Score98.43 | 16 | |
| Safety Classification | XSTest | F1 Score94 | 16 | |
| Prompt classification | XSTest | F1 Score94.8 | 16 | |
| Safety Evaluation | XSTest Toxic | Safety95 | 15 | |
| Refusal Evaluation | XSTest Seemingly Toxic Subsets | XS98 | 15 | |
| Safety Alignment | XSTest | Compliance95.2 | 15 | |
| Safety Evaluation | XSTest | ASR0.9 | 14 | |
| Prompt Classification | XSTest Text Prompt | F1 Score93.71 | 14 | |
| Safety Evaluation | XSTest | FRR1.6 | 14 | |
| Safety Classification | XSTestResponse | F1 Score0.96 | 14 | |
| Safety Evaluation | XSTest (out-of-domain) | Accuracy88.67 | 12 | |
| Output Moderation | XSTEST-RESP (XSTESTR) | F1 Score92.7 | 11 | |
| Safety & Helpfulness Evaluation | XSTest | XSTest Score74.8 | 11 | |
| Prompt-Response Safety Routing | XSTest | Routing F156.21 | 10 | |
| Overblocking evaluation | XSTEST-RESP (benign) | F1 Score89.9 | 9 | |
| Safety Evaluation | XSTest | Safe Compliance98.1 | 9 |