| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | The Pile (test) | PPL (The Pile Test)9.213 | 27 | |
| Language Modeling | The Pile | Perplexity4.14 | 25 | |
| Language Modeling | The Pile deduplicated (val) | Perplexity7.14 | 22 | |
| Language Modeling | The Pile (val) | Perplexity (bits/byte)0.62 | 20 | |
| Language Modeling | The Pile non-copyrighted (test) | BPB0.557 | 20 | |
| Knowledge Unlearning | The Pile 32 sample (val) | EL10 (%)0 | 15 | |
| Membership Inference Attack | The Pile | AUROC0.927 | 14 | |
| Training Data Extraction | The Pile (train) | Exact Extract Rate45 | 10 | |
| Data Extraction | The Pile (test) | Fractional Extraction Rate63.4 | 10 | |
| Language Modeling | The Pile non-copyrighted without Wikipedia (test) | BPB0.559 | 8 | |
| Membership Inference Attack | The PILE (train test) | Loss66.5 | 7 | |
| Property-based retrieval | The Pile (test) | MAP54.2 | 6 | |
| Unsupervised OOD detection | The Pile (ID) Twitter (OOD) (test) | AUROC99.22 | 5 | |
| Unsupervised OOD detection | The Pile EDGAR Reports ID OOD (test) | AUROC68.09 | 5 | |
| Unsupervised OOD detection | The Pile ID 4Chan OOD (test) | AUROC87.97 | 5 |