| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| zsRE | AlphaEdit | Efficacy99.79 | 71 | 11d ago | |
| UltraEditBench | UltraEdit | Efficacy85.7 | 51 | 1mo ago | |
| WikiBigEdit | UltraEdit | Efficacy79.6 | 49 | 1mo ago | |
| FEVER | UltraEdit* | Efficacy98.23 | 49 | 1mo ago | |
| WikiBigEdit | MMLU69.5 | 34 | 1mo ago | ||
| RIPE | DAFNet | Reliability97.8 | 30 | 1mo ago | |
| CounterFact | DAFNet | Reliability92 | 30 | 1mo ago | |
| CounterFact | CrispEdit | Reliability79.4 | 26 | 1mo ago | |
| ZsRE | CrispEdit | Reliability80.5 | 26 | 1mo ago | |
| CounterFact | HORSE | Efficacy96.12 | 24 | 1mo ago | |
| ZSRE sequential editing of 1000 facts | Efficacy99.7 | 21 | 1mo ago | ||
| RuleEdit-200 | DMLE | Reliability (Rel.)98.17 | 20 | 8d ago | |
| ZSRE | DAFNet | Reliability0.975 | 16 | 1mo ago | |
| WikiBigEdit 3,000 samples (test) | LocBF-FT | Reliability99.9 | 13 | 1mo ago | |
| CounterFact 3,000 samples (test) | CrispEdit | Reliability9,980 | 13 | 1mo ago | |
| ZsRE 3,000 samples (test) | LocBF-FT | Relational Score99.1 | 13 | 1mo ago | |
| E-IC 5 | DSCA | Reliability (Rel.)98 | 11 | 8d ago | |
| E-VQA 5 | DSCA | Reliability Score98.12 | 11 | 8d ago | |
| UnKE | UltraEdit | Efficacy94.09 | 11 | 1mo ago | |
| CounterFact | Efficacy98.1 | 10 | 1mo ago | ||
| COUNTERFACT 7,500-record GPT-2 XL (test) | ROME | Score89.2 | 9 | 1mo ago | |
| COUNTERFACT | AlphaEdit | Efficacy99.75 | 8 | 11d ago | |
| CounterFact | CrispEdit | Rel (QA Context)64.6 | 8 | 1mo ago | |
| ZsRE 3,000 samples | CrispEdit | Rel Score (QA Context)77.8 | 8 | 1mo ago | |
| Sanitation | Locality0.2218 | 8 | 1mo ago |