| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | The Pile | Perplexity2.53 | 129 | |
| Language Modeling | The Pile (test) | PPL (The Pile Test)9.213 | 53 | |
| Language Modeling | The Pile (val) | Perplexity (bits/byte)0.62 | 31 | |
| Language Modeling | The Pile deduplicated (val) | Perplexity7.14 | 22 | |
| Language Modeling | The Pile non-copyrighted (test) | BPB0.557 | 20 | |
| Language Modeling | The Pile (val) | Loss1.6923 | 15 | |
| Knowledge Unlearning | The Pile 32 sample (val) | EL10 (%)0 | 15 | |
| Membership Inference Attack | The Pile | AUROC0.927 | 14 | |
| Language Modeling | The Pile (eval) | Perplexity (PPL)14.1 | 12 | |
| Training Data Extraction | The Pile (train) | Exact Extract Rate45 | 10 | |
| Data Extraction | The Pile (test) | Fractional Extraction Rate63.4 | 10 | |
| Language Modeling | The Pile non-copyrighted without Wikipedia (test) | BPB0.559 | 8 | |
| General NLP Evaluation | The Pile Downstream Evaluation Suite | HellaSwag Accuracy29.7 | 7 | |
| Membership Inference Attack | The PILE (train test) | Loss66.5 | 7 | |
| Language Modeling | The Pile Wikipedia (test) | BPB66.71 | 6 | |
| Language Modeling | The Pile USPTO (test) | BPB65.7 | 6 | |
| Language Modeling | The Pile StackEx. (test) | BPB (%)80.37 | 6 | |
| Language Modeling | The Pile PubMed Cent. (test) | BPB%84.13 | 6 | |
| Language Modeling | The Pile PubMed Abs. (test) | BPB (%)87.7 | 6 | |
| Language Modeling | The Pile NIH (test) | BPB66.05 | 6 | |
| Language Modeling | The Pile HackerNews (test) | BPB (Bits Per Byte)0.8583 | 6 | |
| Language Modeling | The Pile Enron (test) | BPB57.32 | 6 | |
| Language Modeling | The Pile Github (test) | Bits Per Byte (BPB)40.88 | 6 | |
| Language Modeling | The Pile FreeLaw (test) | BPB (%)71.63 | 6 | |
| Language Modeling | The Pile DM Math (test) | BPB (%)82.14 | 6 |