Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Pile

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingThe Pile
Perplexity4.14
94
Language ModelingThe Pile (test)
PPL (The Pile Test)9.213
51
Language ModelingThe Pile (val)
Perplexity (bits/byte)0.62
31
Language ModelingThe Pile deduplicated (val)
Perplexity7.14
22
Language ModelingThe Pile non-copyrighted (test)
BPB0.557
20
Knowledge UnlearningThe Pile 32 sample (val)
EL10 (%)0
15
Membership Inference AttackThe Pile
AUROC0.927
14
Language ModelingThe Pile (eval)
Perplexity (PPL)14.1
12
Training Data ExtractionThe Pile (train)
Exact Extract Rate45
10
Data ExtractionThe Pile (test)
Fractional Extraction Rate63.4
10
Language ModelingThe Pile non-copyrighted without Wikipedia (test)
BPB0.559
8
General NLP EvaluationThe Pile Downstream Evaluation Suite
HellaSwag Accuracy29.7
7
Membership Inference AttackThe PILE (train test)
Loss66.5
7
Property-based retrievalThe Pile (test)
MAP54.2
6
Knowledge DistillationThe Pile
Raw KL Divergence1,200
5
Unsupervised OOD detectionThe Pile (ID) Twitter (OOD) (test)
AUROC99.22
5
Unsupervised OOD detectionThe Pile EDGAR Reports ID OOD (test)
AUROC68.09
5
Unsupervised OOD detectionThe Pile ID 4Chan OOD (test)
AUROC87.97
5
Language ModelingThe Pile PubMed Central (test)
PPL7.25
2
Language ModelingThe Pile Github (test)
Perplexity (PPL)3.42
2
Language ModelingThe Pile FreeLaw (test)
Perplexity (PPL)4.85
2
Language ModelingThe Pile DM Math (test)
Perplexity7.81
2
Language ModelingThe Pile ArXiv (test)
Perplexity (PPL)9.92
2
Showing 23 of 23 rows