| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| C4 Llama 2 pre-training (val) | RMSProp+Magma | Perplexity13.19 | 47 | 4d ago | |
| C4 Llama-160M scratch (val) | GPA-AdamW | Validation Loss3.0908 | 20 | 4d ago | |
| C4 Llama-1B scratch (val) | SF-AdamW | Validation Loss2.638 | 16 | 4d ago | |
| BERT-Large NVIDIA V100 (train) | Checkpoint | Max Batch Size96 | 6 | 4d ago | |
| BERT-Large NVIDIA 2080 Ti (train) | Checkpoint | Max Batch Size50 | 6 | 4d ago | |
| Datacomp-LM 400M-1x scale 1.0 | DCLM-baseline + SIEVE | Core Score17.9 | 3 | 4d ago |