| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Refusal Evaluation | OR-Bench | Toxic Refusal Rate99.6 | 40 | |
| Safety Guardrailing | OR-Bench | False Positive Rate0 | 26 | |
| Overrefusal Detection | OR-Bench | AUROC93.42 | 18 | |
| Over-refusal rate analysis | OR-Bench | Over-refusal Rate24.1 | 15 | |
| Benign Prompt Filtering | OR-Bench | False Positive Rate0 | 12 | |
| Guardrail False Positive Rate Estimation | OR-Bench benign prompts | False Positive Rate0 | 8 | |
| Refusal Evaluation | OR-Bench Toxic | Refusal Rate94.66 | 7 | |
| Safety and Utility Evaluation | OR-Bench | HarmR5.3 | 4 | |
| Adversarial Attack | OR-Bench unsafe inputs | ASR88 | 4 | |
| Safety Classification | OR-Bench | F1 Score77 | 3 |