| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| OrBench-H | RR99.85 | 21 | 20d ago | ||
| Fortress OR | RECAP | Helpfulness Score97.6 | 12 | 6d ago | |
| SQL-1k | Refusal Rate (RR)0.3 | 6 | 1mo ago | ||
| GSM-8k | RR0 | 6 | 1mo ago | ||
| JBench-B | RR92 | 6 | 1mo ago | ||
| Koala | Db as Our Data | Refusal Rate4.44 | 6 | 1mo ago | |
| TensorTrust overrefusal | Performance Score91 | 2 | 1mo ago | ||
| IH-Challenge overrefusal | GPT-5-Mini-R | Performance Score1 | 2 | 1mo ago |