| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | The Pile | Perplexity4.14 | 94 | |
| Language Modeling | The Pile (test) | PPL (The Pile Test)9.213 | 51 | |
| Language Modeling | The Pile (val) | Perplexity (bits/byte)0.62 | 31 | |
| Language Modeling | The Pile deduplicated (val) | Perplexity7.14 | 22 | |
| Language Modeling | The Pile non-copyrighted (test) | BPB0.557 | 20 | |
| Knowledge Unlearning | The Pile 32 sample (val) | EL10 (%)0 | 15 | |
| Membership Inference Attack | The Pile | AUROC0.927 | 14 | |
| Language Modeling | The Pile (eval) | Perplexity (PPL)14.1 | 12 | |
| Training Data Extraction | The Pile (train) | Exact Extract Rate45 | 10 | |
| Data Extraction | The Pile (test) | Fractional Extraction Rate63.4 | 10 | |
| Language Modeling | The Pile non-copyrighted without Wikipedia (test) | BPB0.559 | 8 | |
| General NLP Evaluation | The Pile Downstream Evaluation Suite | HellaSwag Accuracy29.7 | 7 | |
| Membership Inference Attack | The PILE (train test) | Loss66.5 | 7 | |
| Property-based retrieval | The Pile (test) | MAP54.2 | 6 | |
| Knowledge Distillation | The Pile | Raw KL Divergence1,200 | 5 | |
| Unsupervised OOD detection | The Pile (ID) Twitter (OOD) (test) | AUROC99.22 | 5 | |
| Unsupervised OOD detection | The Pile EDGAR Reports ID OOD (test) | AUROC68.09 | 5 | |
| Unsupervised OOD detection | The Pile ID 4Chan OOD (test) | AUROC87.97 | 5 | |
| Language Modeling | The Pile PubMed Central (test) | PPL7.25 | 2 | |
| Language Modeling | The Pile Github (test) | Perplexity (PPL)3.42 | 2 | |
| Language Modeling | The Pile FreeLaw (test) | Perplexity (PPL)4.85 | 2 | |
| Language Modeling | The Pile DM Math (test) | Perplexity7.81 | 2 | |
| Language Modeling | The Pile ArXiv (test) | Perplexity (PPL)9.92 | 2 |