| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Summarization | arXiv (test) | ROUGE-164.16 | 161 | |
| Language Modeling | ARXIV (test) | PPL2.36 | 137 | |
| Node Classification | arXiv-year | Accuracy64.62 | 85 | |
| Summarization | Arxiv | ROUGE-223.05 | 76 | |
| Node Classification | Arxiv | Accuracy78.26 | 41 | |
| Membership Inference Attack | arXiv Pythia | ROC AUC94 | 36 | |
| Node Classification | Arxiv Covariate shift (degree split) | OOD Accuracy66.41 | 30 | |
| Membership Inference Attack | ArXiv | AUC85 | 26 | |
| Summarization | ArXiv (test) | Completeness Score5 | 24 | |
| Long-document summarization | ArXiv (test) | ROUGE-2 Score22.5 | 24 | |
| Rubric satisfaction evaluation | ArXiv | Claude-4 Sonnet Score38.1 | 21 | |
| Language Modeling | arXiv | Perplexity17.47 | 21 | |
| Node unlearning | Arxiv | Average Runtime (s)0.03 | 20 | |
| Masked Language Modeling Fine-tuning | arXiv (fine-tuning) | MSE7.92 | 20 | |
| Node Classification | Arxiv Covariate shift time split | OOD Test Accuracy66.47 | 20 | |
| Abstractive Summarization | arXiv (test) | R-153.7 | 20 | |
| Link Prediction | arXiv 14 (test) | AUC93.66 | 20 | |
| Watermark Segment Classification | Arxiv Mistral-7B (val) | TPR100 | 18 | |
| Watermark Segment Classification | Arxiv Llama-7B (val) | TPR100 | 18 | |
| Summarization | arXiv original (test) | R-160 | 18 | |
| Node Classification | Arxiv overall | Accuracy74.7 | 17 | |
| Graph Continual Learning | Arxiv (test) | AA90.3 | 16 | |
| Machine-paraphrased plagiarism detection | arXiv SpinBot paraphrased (test) | F1 (Micro)86.46 | 15 | |
| Link Prediction | Arxiv 2023 | PRC78 | 14 | |
| Node Classification | Arxiv 2023 (test) | Accuracy58.2 | 14 |