Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GITHUB

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingGitHub (test)
Perplexity2.42
113
Membership Inference AttackGitHub Pythia
ROC AUC1
36
Membership InferenceGitHub Pythia (train)
TPR@1%FPR95.6
36
Membership Inference AttackGitHub
AUC0.876
32
Semi-supervised graph classificationGITHUB 10-fold cross-validation
Accuracy0.6996
21
Graph ClassificationGITHUB
Accuracy71.06
18
Language ModelingGitHub (val)
Perplexity1.83
13
Language ModelingGitHub tokens (test)
Bits Per Token (BPT)0.976
11
Tokenization efficiencyGitHub
Token Count688
6
Autonomous Task CompletionGitHub
Success Rate84
6
Token EfficiencyGitHub Events
JSON Compact Token Count968
1
Graph ExplanationGithub (test)
FID65.01
1
Website NavigationGitHub (test)
Metric-
0
Showing 13 of 13 rows