| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Node Classification | Arxiv | Accuracy83.78 | 219 | |
| Summarization | arXiv (test) | ROUGE-164.16 | 161 | |
| Language Modeling | ARXIV (test) | PPL2.36 | 145 | |
| Node Classification | arXiv-year | Accuracy64.62 | 112 | |
| Summarization | Arxiv | ROUGE-223.05 | 76 | |
| Language Modeling | arXiv | Perplexity2.46 | 55 | |
| Node Classification | Arxiv | Clean Accuracy66.83 | 52 | |
| Membership Inference Attack | arXiv Pythia | ROC AUC94 | 36 | |
| Node Classification | Arxiv (test) | ASR97.03 | 32 | |
| Membership Inference Attack | ArXiv | AUC85 | 32 | |
| Node Classification | Arxiv Covariate shift (degree split) | OOD Accuracy66.41 | 30 | |
| Graph Backdoor Attack | Arxiv | ASR97.03 | 28 | |
| GNN training | ARXIV | Speedup1.044 | 24 | |
| Language Modeling | Arxiv (val) | Perplexity18.22 | 24 | |
| Summarization | ArXiv (test) | Completeness Score5 | 24 | |
| Long-document summarization | ArXiv (test) | ROUGE-2 Score22.5 | 24 | |
| Text Segmentation | arXiv | Pk0.3733 | 22 | |
| Node Classification | arxiv | ASR99.96 | 21 | |
| Rubric satisfaction evaluation | ArXiv | Claude-4 Sonnet Score38.1 | 21 | |
| Node unlearning | Arxiv | Average Runtime (s)0.03 | 20 | |
| Masked Language Modeling Fine-tuning | arXiv (fine-tuning) | MSE7.92 | 20 | |
| Node Classification | Arxiv Covariate shift time split | OOD Test Accuracy66.47 | 20 | |
| Abstractive Summarization | arXiv (test) | R-153.7 | 20 | |
| Link Prediction | arXiv 14 (test) | AUC93.66 | 20 | |
| Node Classification | Arxiv (2018-2020) | Accuracy60.78 | 18 |