| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| OrBench-H | RR99.85 | 21 | 2mo ago | ||
| Fortress OR | RECAP | Helpfulness Score97.6 | 12 | 1mo ago | |
| Over-refusal Evaluation Suite OB, FB | WaltzRL | Overrefusal Rate (OB)9.9 | 7 | 1mo ago | |
| SQL-1k | Refusal Rate (RR)0.3 | 6 | 2mo ago | ||
| GSM-8k | RR0 | 6 | 2mo ago | ||
| JBench-B | RR92 | 6 | 2mo ago | ||
| Koala | Db as Our Data | Refusal Rate4.44 | 6 | 2mo ago | |
| TensorTrust overrefusal | Performance Score91 | 2 | 2mo ago | ||
| IH-Challenge overrefusal | GPT-5-Mini-R | Performance Score1 | 2 | 2mo ago |