| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| XSTest | Overrefusal Rate0 | 102 | 7d ago | ||
| Wildjailbreak (Benign) | Categorical Steering | Wildjailbreak Benign Refusal Rate1.43 | 49 | 2mo ago | |
| Over Refusal scenario | ASR100 | 42 | 19d ago | ||
| Over-refusal Evaluation Suite (XSTest, WildJailbreak, WildGuard, OKTest, OR-Bench) | XSTest Refusal Rate (%)3.2 | 24 | 3mo ago | ||
| Over-refusal XSTest and OKTest | ReSA-RL (Ours) | Over-refusal Accuracy (XSTest)99.2 | 12 | 2mo ago |