| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| WikiBigEdit | MMLU69.5 | 34 | 4d ago | ||
| RIPE | DAFNet | Reliability97.8 | 30 | 4d ago | |
| CounterFact | DAFNet | Reliability92 | 30 | 4d ago | |
| CounterFact | CrispEdit | Reliability79.4 | 26 | 4d ago | |
| ZsRE | CrispEdit | Reliability80.5 | 26 | 4d ago | |
| CounterFact | HORSE | Efficacy96.12 | 24 | 4d ago | |
| zsRE | HORSE | Efficacy98.91 | 24 | 4d ago | |
| ZSRE | DAFNet | Reliability0.975 | 16 | 4d ago | |
| WikiBigEdit 3,000 samples (test) | LocBF-FT | Reliability99.9 | 13 | 4d ago | |
| CounterFact 3,000 samples (test) | CrispEdit | Reliability9,980 | 13 | 4d ago | |
| ZsRE 3,000 samples (test) | LocBF-FT | Relational Score99.1 | 13 | 4d ago | |
| COUNTERFACT 7,500-record GPT-2 XL (test) | ROME | Score89.2 | 9 | 4d ago | |
| CounterFact | CrispEdit | Rel (QA Context)64.6 | 8 | 4d ago | |
| ZsRE 3,000 samples | CrispEdit | Rel Score (QA Context)77.8 | 8 | 4d ago | |
| Sanitation | Locality0.2218 | 8 | 4d ago | ||
| Hallucination | Defer | TRR8,183.7 | 8 | 4d ago | |
| SCOTUS | GRACE | TRR81 | 7 | 4d ago | |
| zsRE | Defer | TRR72 | 7 | 4d ago | |
| ZsRE No Context 10K | CrispEdit | Reliability31.1 | 6 | 4d ago | |
| ECBD Popular | Prepend Def. | Target PPL31.7 | 6 | 4d ago | |
| zsRE | ROME | Efficacy99.6 | 6 | 4d ago | |
| COUNTERFACT | ROME | S88.2 | 6 | 4d ago | |
| COUNTERFACT 2,000-record GPT-J (test) | ROME | Score (S)91.5 | 5 | 4d ago |