| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Membership Inference Attack | Wikipedia | AUC0.9 | 52 | |
| Dynamic Graph Anomaly Detection | Wikipedia S2 | AUROC83.39 | 42 | |
| Response correctness and completeness evaluation | Wikipedia | F1 Score68 | 38 | |
| Membership Inference Attack | Wikipedia Pythia | ROC AUC74 | 36 | |
| Membership Inference | Wikipedia Pythia (train) | TPR@1%FPR22.7 | 36 | |
| Reliability of post-edit LLMs | Wikipedia | BLEU100 | 36 | |
| transductive dynamic link prediction | Wikipedia | AUC ROC98.91 | 27 | |
| Dynamic link prediction | Wikipedia | AP99.03 | 27 | |
| Membership Inference Attack | Wikipedia en | AUC0.79 | 26 | |
| Inductive dynamic link prediction | Wikipedia (inductive) | AUC-ROC0.9848 | 24 | |
| Dynamic Link Prediction | Wikipedia Inductive | AP98.59 | 24 | |
| Document Classification | Wikipedia (test) | Classification Error30.24 | 24 | |
| Link Prediction | Wikipedia (inductive) | AP99.04 | 21 | |
| Link Prediction | Wikipedia transductive | AP99.31 | 21 | |
| Machine-paraphrased plagiarism detection | Wikipedia SpinBot paraphrased (test) | F1-Micro89.55 | 15 | |
| Language Modeling | Wikipedia | Perplexity11.64 | 14 | |
| AI-generated text detection | Wikipedia OPT-13B generations (+ 60L,600) | Accuracy (1% FPR)97.2 | 14 | |
| Page Classification | Wikipedia (90% train ratio) | Macro-F1 Score83.66 | 13 | |
| Link prediction | Wikipedia | AUC99.2 | 12 | |
| Text-to-Image Retrieval | Wikipedia random partition (test) | MAP (0.2 Noise)47.1 | 11 | |
| Image-to-Text Retrieval | Wikipedia random partition (test) | MAP (0.2 noise)51.6 | 11 | |
| Node Classification | Wikipedia rich-text graph (test) | Accuracy90.3 | 10 | |
| Node Classification | Wikipedia (test) | NMI0.795 | 10 | |
| Sentence Splitting | Wikipedia BOTH-AB (sentences split by both systems) | Average Score4.75 | 10 | |
| Time Series Forecasting | Wikipedia | Distortion1.04 | 9 |