| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| NQ (Natural Questions) | VectorSteer | ORR0 | 72 | 4d ago | |
| ORFuzzSet | LLM-VA | ORR16 | 72 | 4d ago | |
| MMMU in-scope (test) | Prompt-based | Math Score37 | 32 | 4d ago | |
| ScienceQA in-scope (test) | System Prompt | Biology Refusal Count0 | 32 | 4d ago | |
| XSTest (test) | Claude Sonnet 4.5 | Over-refusal Rate0.035 | 4 | 4d ago |