| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| PAVE (test) | JSD-based arbitration | IE59 | 45 | 1d ago | |
| MemoTrap | RCA | Micro Accuracy77.35 | 4 | 1d ago | |
| NQ-Swap | RCA | Exact Match77.46 | 4 | 1d ago | |
| KID-Bench v2 | RAG | Performance (Difficulty A)97.6 | 4 | 1mo ago | |
| RippleEdits style 40 q | Conflict-Aware | Accuracy77.5 | 4 | 1mo ago | |
| KID-Bench Category C v2 | Conflict-Aware | Accuracy (C-Light)78.1 | 3 | 1mo ago | |
| CounterFact 500 | Conflict-Aware | Accuracy96.6 | 3 | 1mo ago | |
| CounterFact-style 40 q | Conflict-Aware | Accuracy67.5 | 3 | 1mo ago | |
| Held-out 30 q | Conflict-Aware | Accuracy76.7 | 3 | 1mo ago |