| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| XSTest | Overrefusal Rate0 | 78 | 5d ago | ||
| Wildjailbreak (Benign) | Categorical Steering | Wildjailbreak Benign Refusal Rate1.43 | 49 | 1mo ago | |
| Over Refusal scenario | OLMo-2-7B | ASR (Attacked)98.7 | 24 | 11d ago | |
| Over-refusal Evaluation Suite (XSTest, WildJailbreak, WildGuard, OKTest, OR-Bench) | XSTest Refusal Rate (%)3.2 | 24 | 1mo ago | ||
| Over-refusal XSTest and OKTest | ReSA-RL (Ours) | Over-refusal Accuracy (XSTest)99.2 | 12 | 1mo ago |