| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Over-refusal | XSTest | Overrefusal Rate0 | 102 | |
| Safety Evaluation | XSTest Unsafe | False Compliance Rate (FC)0 | 78 | |
| Safety Evaluation | XSTest Safe | FC4 | 78 | |
| Response Harmfulness Detection | XSTEST-RESP | Response Harmfulness F195.48 | 76 | |
| Refusal Rate Evaluation | XSTest Safe (test) | Refusal Rate0 | 56 | |
| Safety Evaluation | XSTest | F1 Score97 | 44 | |
| Safety Evaluation | XSTest (test) | XSTest Score95 | 36 | |
| Adversarial and Jailbreaking Attack Detection | XSTest | AUROC0.984 | 35 | |
| Safety Evaluation | XSTest (combined) | F1 Score100 | 34 | |
| Safety Evaluation | XSTest | Safety Score98.4 | 32 | |
| Over-refusal Evaluation | XSTest | Evaluation Score (avg@4)100 | 26 | |
| Refusal Evaluation | XSTest Unsafe | Refusal Rate100 | 25 | |
| Over-refusal Evaluation | XSTest Safe | Over-refusal Rate1.6 | 25 | |
| Safety Alignment | XSTest | Compliance95.2 | 21 | |
| Harmful prompt detection | XSTest | F1 Score97.44 | 20 | |
| Harmlessness | XsTest | Refusal Rate99 | 20 | |
| Safety Classification | XSTest (test) | F192.91 | 20 | |
| Safety Evaluation | XSTest | FRR1.6 | 19 | |
| Safety Evaluation | XsTest | Harmful Rate2 | 16 | |
| Response Classification | XSTest Text Response | F1 Score98.43 | 16 | |
| Safety Classification | XSTest | F1 Score94 | 16 | |
| Prompt classification | XSTest | F1 Score94.8 | 16 | |
| Safety Evaluation | XSTest Toxic | Safety95 | 15 | |
| Refusal Evaluation | XSTest Seemingly Toxic Subsets | XS98 | 15 | |
| Model Utility Evaluation | XSTest | CR96.4 | 14 |