| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Knowledge Editing | Counterfact | Efficacy9,387 | 301 | |
| Sequential model editing | Counterfact | Efficacy99.77 | 61 | |
| Subject inference attack | CounterFact batch-edit tasks | Recall100 | 36 | |
| Lifelong Model Editing | CounterFact | Efficacy73.06 | 33 | |
| Sequential Knowledge Editing | CounterFact sequential editing 10,000 Samples | Efficacy Success99.5 | 33 | |
| Knowledge Editing | Counterfact uns | Edit Success Rate94.56 | 30 | |
| Model Editing | CounterFact | Reliability92 | 30 | |
| Knowledge Editing | Counterfact 10,000 facts | Relational Score10,000 | 27 | |
| Knowledge Model Editing | CounterFact | Efficacy64.85 | 26 | |
| Model Editing | CounterFact | Reliability79.4 | 26 | |
| Model Editing | CounterFact | Efficacy96.12 | 24 | |
| Classification Probing | Counterfact (test) | Probe Acc (Best Layer)89.6 | 21 | |
| Knowledge Editing | Counterfact Full (test) | Rel. Accuracy99 | 21 | |
| Sequential Knowledge Editing | CounterFact larger | Efficacy98.97 | 14 | |
| Sequential Knowledge Editing | CounterFact top | Efficacy93.87 | 14 | |
| Lifelong Knowledge Editing | COUNTERFACT | Reliability67.1 | 14 | |
| Model Editing | CounterFact 3,000 samples (test) | Reliability9,980 | 13 | |
| Knowledge Editing | Counterfact (test) | RwA99.86 | 12 | |
| Prompt recovery attack | CounterFact | Top-1 Accuracy60 | 12 | |
| Model Editing | CounterFact | Efficacy98.1 | 10 | |
| Sequential Model Editing | CounterFact full (10K sequential edits) (test) | Efficacy94.45 | 10 | |
| Sequential Knowledge Editing | CounterFact | Efficacy100 | 10 | |
| Prompt recovery attack | CounterFact (test) | Top-1 Accuracy54 | 9 | |
| Model Editing | COUNTERFACT 7,500-record GPT-2 XL (test) | Score89.2 | 9 | |
| Model Editing | COUNTERFACT | Efficacy99.75 | 8 |