| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| UltraEditBench | STABLEEDIT | Efficacy88.88 | 78 | 21d ago | |
| ZsRE | HoReN | Reliability1 | 72 | 22d ago | |
| zsRE | AlphaEdit | Efficacy99.79 | 71 | 1mo ago | |
| RIPE | LTE | Reliability99.76 | 50 | 1mo ago | |
| WikiBigEdit | UltraEdit | Efficacy79.6 | 49 | 2mo ago | |
| FEVER | UltraEdit* | Efficacy98.23 | 49 | 2mo ago | |
| FEVER 20K edits (test) | STABLEEDIT | Efficacy99.07 | 36 | 21d ago | |
| WikiBigEdit | MMLU69.5 | 34 | 22d ago | ||
| CounterFact | DAFNet | Reliability92 | 30 | 3mo ago | |
| CounterFact | CrispEdit | Reliability79.4 | 26 | 3mo ago | |
| ZsRE | CrispEdit | Reliability80.5 | 26 | 3mo ago | |
| CounterFact | HORSE | Efficacy96.12 | 24 | 3mo ago | |
| ZSRE sequential editing of 1000 facts | Efficacy99.7 | 21 | 3mo ago | ||
| RuleEdit-200 | DMLE | Reliability (Rel.)98.17 | 20 | 1mo ago | |
| ZSRE | DAFNet | Reliability0.975 | 16 | 3mo ago | |
| WikiBigEdit 3,000 samples (test) | LocBF-FT | Reliability99.9 | 13 | 3mo ago | |
| CounterFact 3,000 samples (test) | CrispEdit | Reliability9,980 | 13 | 3mo ago | |
| ZsRE 3,000 samples (test) | LocBF-FT | Relational Score99.1 | 13 | 3mo ago | |
| E-IC 5 | DSCA | Reliability (Rel.)98 | 11 | 1mo ago | |
| E-VQA 5 | DSCA | Reliability Score98.12 | 11 | 1mo ago | |
| UnKE | UltraEdit | Efficacy94.09 | 11 | 2mo ago | |
| CounterFact | Efficacy98.1 | 10 | 2mo ago | ||
| WikiBigEdit 500K edits | STABLEEDIT | Efficacy74.54 | 9 | 21d ago | |
| COUNTERFACT 7,500-record GPT-2 XL (test) | ROME | Score89.2 | 9 | 3mo ago | |
| COUNTERFACT | AlphaEdit | Efficacy99.75 | 8 | 1mo ago |