| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Knowledge Editing | Counterfact | Efficacy9,387 | 91 | |
| Subject inference attack | CounterFact batch-edit tasks | Recall100 | 36 | |
| Sequential Knowledge Editing | CounterFact sequential editing 10,000 Samples | Efficacy Success99.5 | 33 | |
| Model Editing | CounterFact | Reliability92 | 30 | |
| Knowledge Editing | Counterfact 10,000 facts | Relational Score10,000 | 27 | |
| Model Editing | CounterFact | Reliability79.4 | 26 | |
| Model Editing | CounterFact | Efficacy96.12 | 24 | |
| Sequential model editing | Counterfact | Efficacy99.55 | 24 | |
| Classification Probing | Counterfact (test) | Probe Acc (Best Layer)89.6 | 21 | |
| Knowledge Editing | Counterfact Full (test) | Rel. Accuracy99 | 21 | |
| Lifelong Knowledge Editing | COUNTERFACT | Reliability67.1 | 14 | |
| Model Editing | CounterFact 3,000 samples (test) | Reliability9,980 | 13 | |
| Knowledge Editing | Counterfact (test) | RwA99.86 | 12 | |
| Prompt recovery attack | CounterFact | Top-1 Accuracy60 | 12 | |
| Sequential Model Editing | CounterFact full (10K sequential edits) (test) | Efficacy94.45 | 10 | |
| Sequential Knowledge Editing | CounterFact | Efficacy100 | 10 | |
| Prompt recovery attack | CounterFact (test) | Top-1 Accuracy54 | 9 | |
| Model Editing | COUNTERFACT 7,500-record GPT-2 XL (test) | Score89.2 | 9 | |
| Model Editing | CounterFact | Rel (QA Context)64.6 | 8 | |
| Hallucination Detection | counterfact | AUROC0.84 | 8 | |
| Knowledge Editing | Counterfact (val) | Relational Score1 | 8 | |
| Knowledge Editing | Counterfact (first 2000 edits) | Accuracy99.95 | 8 | |
| Knowledge Editing | Counterfact (first 150 edits) | DI Score98.67 | 8 | |
| Knowledge Editing | CounterFact 15000 (test) | Efficacy91.22 | 6 | |
| Subject inference attack | CounterFact | Attack Performance100 | 6 |