Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ArXiv

Benchmarks

Task NameDataset NameSOTA ResultTrend
SummarizationarXiv (test)
ROUGE-164.16
161
Language ModelingARXIV (test)
PPL2.36
137
Node ClassificationarXiv-year
Accuracy64.62
85
SummarizationArxiv
ROUGE-223.05
76
Node ClassificationArxiv
Accuracy78.26
41
Membership Inference AttackarXiv Pythia
ROC AUC94
36
Node ClassificationArxiv Covariate shift (degree split)
OOD Accuracy66.41
30
Membership Inference AttackArXiv
AUC85
26
SummarizationArXiv (test)
Completeness Score5
24
Long-document summarizationArXiv (test)
ROUGE-2 Score22.5
24
Rubric satisfaction evaluationArXiv
Claude-4 Sonnet Score38.1
21
Language ModelingarXiv
Perplexity17.47
21
Node unlearningArxiv
Average Runtime (s)0.03
20
Masked Language Modeling Fine-tuningarXiv (fine-tuning)
MSE7.92
20
Node ClassificationArxiv Covariate shift time split
OOD Test Accuracy66.47
20
Abstractive SummarizationarXiv (test)
R-153.7
20
Link PredictionarXiv 14 (test)
AUC93.66
20
Watermark Segment ClassificationArxiv Mistral-7B (val)
TPR100
18
Watermark Segment ClassificationArxiv Llama-7B (val)
TPR100
18
SummarizationarXiv original (test)
R-160
18
Node ClassificationArxiv overall
Accuracy74.7
17
Graph Continual LearningArxiv (test)
AA90.3
16
Machine-paraphrased plagiarism detectionarXiv SpinBot paraphrased (test)
F1 (Micro)86.46
15
Link PredictionArxiv 2023
PRC78
14
Node ClassificationArxiv 2023 (test)
Accuracy58.2
14
Showing 25 of 115 rows