| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Knowledge Conflict Resolution | KID-Bench v2 | Performance (Difficulty A)97.6 | 4 | |
| Knowledge Conflict Resolution | KID-Bench Category C v2 | Accuracy (C-Light)78.1 | 3 | |
| Knowledge Combination | KID-Bench Category B v2 | Accuracy68 | 3 | |
| Novel Knowledge Recall | KID-Bench Category A v2 | Accuracy97.1 | 3 |