Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Practical Scaling Laws: Converting Compute into Performance in a Data-Constrained World

About

The scaling laws guiding modern model training were calibrated for a single regime: data-rich, single-epoch pretraining. The dominant such scaling law form, Chinchilla's $L = E + A/N^\alpha + B/D^\beta$, has three structural limitations outside that regime: it diverges as unique data shrinks instead of saturating at the uninformed baseline; it cannot represent overfitting when capacity exceeds the data; and it conflates total examples seen with unique examples available. We propose a closed-form extension, $L(N, D, T) = E + (L_0 - E)\,h/(1+h)$ with $h = a/N^\alpha + b/T^\beta + c\,N^\gamma/D^\delta$, that decomposes loss into undercapacity, undertraining, and overfitting terms. It saturates between the irreducible loss $E$ and an uninformed baseline $L_0$ fixed by the loss type, and reduces to Chinchilla in the data-rich, single-epoch limit. We validate it on four multi-epoch experiments spanning four architecture families (MLPs, ResNets, Fourier neural operators, and transformers) across vision, scientific ML, and language domains, and refit it to five published LLM scaling-law grids. Extrapolating to higher compute and larger unique data than seen at fit time, our form achieves state-of-the-art RMSE on every published LLM grid we evaluate and on most cells of our constructed experiments. Once calibrated, the form admits a cost-aware allocation that recovers Chinchilla's optimum when data is free and shifts toward smaller corpora and more epochs as data grows expensive.

Christopher M. Bryant, Hao Liu• 2026

Related benchmarks

TaskDatasetResultRank
Scaling-law extrapolationMNIST high-C holdout
RMSE (log space)0.127
6
Scaling-law extrapolationCIFAR-100 high-C holdout
RMSE (log space)0.081
6
Scaling-law extrapolationDarcy high-C (holdout)
RMSE (log space)0.168
6
Scaling-law extrapolationChinchilla grid (high-C holdout)
RMSE (log space)0.007
6
Scaling-law extrapolationMuennighoff grid high-C holdout
RMSE (log space)0.059
6
Scaling-law extrapolationGadre grid high-C holdout
RMSE (log space)0.014
6
Scaling-law extrapolationPorian grid high-C (holdout)
RMSE (log space)0.063
6
Scaling-law extrapolationFarseer grid (high-C holdout)
RMSE (log space)0.008
6
Scaling-law extrapolationCIFAR-100 high-D holdout
RMSE (log space)0.069
6
Scaling-law extrapolationDarcy high-D (holdout)
RMSE (log space)0.17
6
Showing 10 of 18 rows

Other info

Follow for update