| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Machine Unlearning | WMDP | Bio Accuracy24.8 | 74 | |
| Harmful Knowledge Evaluation | WMDP evil | WMDP-evil Score65.37 | 60 | |
| Knowledge Unlearning | WMDP bio | Accuracy71.2 | 51 | |
| Knowledge Unlearning | WMDP cyber | Accuracy47.21 | 47 | |
| Question Answering | WMDP Biology | Default Score64.5 | 38 | |
| Question Answering | WMDP Cyber QA | Default Accuracy44.3 | 38 | |
| Knowledge Retention | WMDP retain | Retain48.3 | 36 | |
| Knowledge Recovery | WMDP-Bio 100-sample subset | ASR0.93 | 36 | |
| Machine Unlearning | WMDP Cyber (test) | MMLU61.15 | 29 | |
| Machine Unlearning | WMDP-cyber 1.0 (test) | BF16 Score53.7 | 28 | |
| Machine Unlearning | WMDP-chem 1.0 (test) | BF160.56 | 28 | |
| Machine Unlearning | WMDP-bio 1.0 (test) | BF16 Accuracy80.3 | 28 | |
| Knowledge Unlearning | WMDP | Performance (Bio)75.9 | 26 | |
| Hazard Knowledge Evaluation | WMDP | Accuracy68.98 | 26 | |
| Unlearning | WMDP retain | Retain55.6 | 22 | |
| Unlearning | WMDP (forget split) | BF16 Precision55.4 | 22 | |
| Fluency Assessment | WMDP | Mean Fluency3.46 | 22 | |
| Machine Unlearning | WMDP | Acc (Bio)74.16 | 21 | |
| Dangerous Knowledge Unlearning | WMDP | S-unlearning Score43 | 16 | |
| Knowledge Retention | WMDP cyber (retain) | Rt54.1 | 16 | |
| Machine Unlearning | WMDP-cyber forget-set | BF16 Performance53.7 | 16 | |
| Knowledge Retention | WMDP-chem (retain) | Rt (Knowledge Retention)56 | 16 | |
| Machine Unlearning | WMDP chem forget-set | BF16 Score56 | 16 | |
| Knowledge Retention | WMDP bio (retain) | Rt80.3 | 16 | |
| Structural Erasure | WMDP-cyber | CAD0 | 16 |