Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Pile

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingThe Pile
Perplexity2.53
129
Language ModelingThe Pile (test)
PPL (The Pile Test)9.213
53
Language ModelingThe Pile (val)
Perplexity (bits/byte)0.62
31
Language ModelingThe Pile deduplicated (val)
Perplexity7.14
22
Language ModelingThe Pile non-copyrighted (test)
BPB0.557
20
Language ModelingThe Pile (val)
Loss1.6923
15
Knowledge UnlearningThe Pile 32 sample (val)
EL10 (%)0
15
Membership Inference AttackThe Pile
AUROC0.927
14
Language ModelingThe Pile (eval)
Perplexity (PPL)14.1
12
Training Data ExtractionThe Pile (train)
Exact Extract Rate45
10
Data ExtractionThe Pile (test)
Fractional Extraction Rate63.4
10
Language ModelingThe Pile non-copyrighted without Wikipedia (test)
BPB0.559
8
General NLP EvaluationThe Pile Downstream Evaluation Suite
HellaSwag Accuracy29.7
7
Membership Inference AttackThe PILE (train test)
Loss66.5
7
Language ModelingThe Pile Wikipedia (test)
BPB66.71
6
Language ModelingThe Pile USPTO (test)
BPB65.7
6
Language ModelingThe Pile StackEx. (test)
BPB (%)80.37
6
Language ModelingThe Pile PubMed Cent. (test)
BPB%84.13
6
Language ModelingThe Pile PubMed Abs. (test)
BPB (%)87.7
6
Language ModelingThe Pile NIH (test)
BPB66.05
6
Language ModelingThe Pile HackerNews (test)
BPB (Bits Per Byte)0.8583
6
Language ModelingThe Pile Enron (test)
BPB57.32
6
Language ModelingThe Pile Github (test)
Bits Per Byte (BPB)40.88
6
Language ModelingThe Pile FreeLaw (test)
BPB (%)71.63
6
Language ModelingThe Pile DM Math (test)
BPB (%)82.14
6
Showing 25 of 34 rows