| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Indirect Object Identification | IOI | Last-Token KL Divergence0.15 | 40 | |
| Component-level attribution | IOI | Dissimilarity (dis.)0 | 32 | |
| Circuit Localization | IOI | CPR3.54 | 30 | |
| Circuit Recovery | IOI | CPR Faithfulness Score91 | 14 | |
| Circuit Discovery | IOI | AUC83.6 | 12 | |
| MCQ Classification | IOI 2 v1 (Eva) | Accuracy100 | 6 | |
| MCQ Classification | IOI v1 (Infer) | Accuracy1 | 6 | |
| Circuit Discovery | IOI | KL Div0.668 | 6 | |
| Competitive Programming | IOI 2025 | Score S100 | 4 | |
| Autointerpretation | IOI | Accuracy76 | 4 | |
| Intrinsic Cluster Quality Evaluation | IOI Pythia-160M | Mean Silhouette Score0.07 | 3 | |
| Intrinsic Cluster Quality Evaluation | IOI | Silhouette Score (Mean)0.03 | 3 | |
| Circuit Discovery | IOI | Sparsity96.74 | 3 | |
| Circuit Discovery | IOI 400 examples v1 | KL Divergence0.22 | 3 | |
| Circuit Discovery | IOI 200 examples v1 | KL Divergence0.25 | 3 | |
| Code Reasoning | IOI 2025 | Score439.28 | 2 | |
| Circuit Discovery | IOI 100K examples v1 | KL Divergence0.2 | 2 | |
| Indirect Object Identification | IOI evaluation episodes (held-out) | Policy Score2.976 | 1 |