| 2-stage multi-lingual (test) | Sedova | R^2 (C)0.87 | | 10 | 1d ago |
| Japanese, Indonesian, and Swahili 1-stage data only | M3 Scaling Law | C (R^2)0.79 | | 6 | 1d ago |
| Farseer grid (high-D holdout) | Ours (Eq. (2)) | RMSE (log space)0.005 | | 6 | 22d ago |
| Porian grid high-D (holdout) | Ours (Eq. (2)) | RMSE (log space)0.033 | | 6 | 22d ago |
| Gadre grid (high-D holdout) | Ours (Eq. (2)) | RMSE (log space)0.014 | | 6 | 22d ago |
| Muennighoff grid (high-D holdout) | Ours (Eq. (2)) | RMSE (log space)0.044 | | 6 | 22d ago |
| Chinchilla grid (high-D holdout) | Ours (Eq. (2)) | RMSE (log space)0.01 | | 6 | 22d ago |
| TinyStories high-D holdout | Ours (Eq. (2)) | RMSE (log space)0.053 | | 6 | 22d ago |
| Darcy high-D (holdout) | Ours (Eq. (2)) | RMSE (log space)0.17 | | 6 | 22d ago |
| CIFAR-100 high-D holdout | Ours (Eq. (2)) | RMSE (log space)0.069 | | 6 | 22d ago |
| MNIST high-D holdout | Muennighoff | RMSE (log space)0.122 | | 6 | 22d ago |
| Farseer grid (high-C holdout) | Ours (Eq. (2)) | RMSE (log space)0.008 | | 6 | 22d ago |
| Porian grid high-C (holdout) | Ours (Eq. (2)) | RMSE (log space)0.063 | | 6 | 22d ago |
| Gadre grid high-C holdout | Ours (Eq. (2)) | RMSE (log space)0.014 | | 6 | 22d ago |
| Muennighoff grid high-C holdout | Ours (Eq. (2)) | RMSE (log space)0.059 | | 6 | 22d ago |
| Chinchilla grid (high-C holdout) | Ours (Eq. (2)) | RMSE (log space)0.007 | | 6 | 22d ago |
| TinyStories high-C holdout | Muennighoff | RMSE (log space)0.095 | | 6 | 22d ago |
| Darcy high-C (holdout) | Ours (Eq. (2)) | RMSE (log space)0.168 | | 6 | 22d ago |
| CIFAR-100 high-C holdout | Ours (Eq. (2)) | RMSE (log space)0.081 | | 6 | 22d ago |
| MNIST high-C holdout | Ours (Eq. (2)) | RMSE (log space)0.127 | | 6 | 22d ago |
| Japanese, Indonesian, and Swahili Multi-lingual single-epoch both stages (test) | M3 Scaling Law | C Score0.9 | | 5 | 1d ago |
| Japanese, Indonesian, and Swahili Monolingual Multi-epoch 1-stage (test) | M3+R*M(k) | R^2 (C)0.88 | | 5 | 1d ago |