Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Wikipedia

Benchmarks

Task NameDataset NameSOTA ResultTrend
Membership Inference AttackWikipedia
AUC0.9
52
Dynamic Graph Anomaly DetectionWikipedia S2
AUROC83.39
42
Response correctness and completeness evaluationWikipedia
F1 Score68
38
Membership Inference AttackWikipedia Pythia
ROC AUC74
36
Membership InferenceWikipedia Pythia (train)
TPR@1%FPR22.7
36
Reliability of post-edit LLMsWikipedia
BLEU100
36
transductive dynamic link predictionWikipedia
AUC ROC98.91
27
Dynamic link predictionWikipedia
AP99.03
27
Membership Inference AttackWikipedia en
AUC0.79
26
Inductive dynamic link predictionWikipedia (inductive)
AUC-ROC0.9848
24
Dynamic Link PredictionWikipedia Inductive
AP98.59
24
Document ClassificationWikipedia (test)
Classification Error30.24
24
Link PredictionWikipedia (inductive)
AP99.04
21
Link PredictionWikipedia transductive
AP99.31
21
Machine-paraphrased plagiarism detectionWikipedia SpinBot paraphrased (test)
F1-Micro89.55
15
Language ModelingWikipedia
Perplexity11.64
14
AI-generated text detectionWikipedia OPT-13B generations (+ 60L,600)
Accuracy (1% FPR)97.2
14
Page ClassificationWikipedia (90% train ratio)
Macro-F1 Score83.66
13
Link predictionWikipedia
AUC99.2
12
Text-to-Image RetrievalWikipedia random partition (test)
MAP (0.2 Noise)47.1
11
Image-to-Text RetrievalWikipedia random partition (test)
MAP (0.2 noise)51.6
11
Node ClassificationWikipedia rich-text graph (test)
Accuracy90.3
10
Node ClassificationWikipedia (test)
NMI0.795
10
Sentence SplittingWikipedia BOTH-AB (sentences split by both systems)
Average Score4.75
10
Time Series ForecastingWikipedia
Distortion1.04
9
Showing 25 of 102 rows