| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Refusal Rate Evaluation | XSTest Safe (test) | Refusal Rate0 | 56 | |
| Over-refusal | XSTest | XSTest Score86.89 | 42 | |
| Response Harmfulness Detection | XSTEST-RESP | Response Harmfulness F195.48 | 34 | |
| Safety Evaluation | XSTest (test) | XSTest Score95 | 32 | |
| Adversarial and Jailbreaking Attack Detection | XSTest | AUROC0.8418 | 20 | |
| Safety Classification | XSTest (test) | F192.91 | 20 | |
| Response Classification | XSTest Text Response | F1 Score98.43 | 16 | |
| Safety Classification | XSTest | F1 Score94 | 16 | |
| Prompt classification | XSTest | F1 Score94.8 | 16 | |
| Prompt Classification | XSTest Text Prompt | F1 Score93.71 | 14 | |
| Safety Evaluation | XSTest | FRR1.6 | 14 | |
| Safety Classification | XSTestResponse | F1 Score0.96 | 14 | |
| Safety Evaluation | XSTest | Safety Score97.9 | 13 | |
| Safety Evaluation | XSTest (out-of-domain) | Accuracy88.67 | 12 | |
| Safety Alignment | XSTest | Compliance95.2 | 12 | |
| Safety & Helpfulness Evaluation | XSTest | XSTest Score74.8 | 11 | |
| Prompt-Response Safety Routing | XSTest | Routing F156.21 | 10 | |
| Refusal Detection | XSTEST-RESP (full) | RR (F1)98.1 | 9 | |
| Safety Evaluation | XSTEST | HS Rate2.05 | 8 | |
| Red-teaming Safety Evaluation | XSTEST | HPR61 | 8 | |
| Safety Moderation | XSTest | F1 Score94.9 | 7 | |
| Unsafe Prompt Detection | XSTest (test) | Precision87.8 | 7 | |
| Over-refusal Evaluation | XSTest (test) | Over-refusal Rate0.035 | 4 | |
| General LLM Evaluation | XSTest | Refusal Rate23.1 | 4 | |
| Unsafe prompt detection | XSTest | AUPRC93.6 | 4 |