Practical Scaling Laws: Converting Compute into Performance in a Data-Constrained World

About

The scaling laws guiding modern model training were calibrated for a single regime: data-rich, single-epoch pretraining. The dominant such scaling law form, Chinchilla's $L = E + A/N^\alpha + B/D^\beta$, has three structural limitations outside that regime: it diverges as unique data shrinks instead of saturating at the uninformed baseline; it cannot represent overfitting when capacity exceeds the data; and it conflates total examples seen with unique examples available. We propose a closed-form extension, $L(N, D, T) = E + (L_0 - E)\,h/(1+h)$ with $h = a/N^\alpha + b/T^\beta + c\,N^\gamma/D^\delta$, that decomposes loss into undercapacity, undertraining, and overfitting terms. It saturates between the irreducible loss $E$ and an uninformed baseline $L_0$ fixed by the loss type, and reduces to Chinchilla in the data-rich, single-epoch limit. We validate it on four multi-epoch experiments spanning four architecture families (MLPs, ResNets, Fourier neural operators, and transformers) across vision, scientific ML, and language domains, and refit it to five published LLM scaling-law grids. Extrapolating to higher compute and larger unique data than seen at fit time, our form achieves state-of-the-art RMSE on every published LLM grid we evaluate and on most cells of our constructed experiments. Once calibrated, the form admits a cost-aware allocation that recovers Chinchilla's optimum when data is free and shifts toward smaller corpora and more epochs as data grows expensive.

Christopher M. Bryant, Hao Liu• 2026

Related benchmarks

Task	Dataset	Result
Scaling-law extrapolation	MNIST high-C holdout	RMSE (log space)0.127	6
Scaling-law extrapolation	CIFAR-100 high-C holdout	RMSE (log space)0.081	6
Scaling-law extrapolation	Darcy high-C (holdout)	RMSE (log space)0.168	6
Scaling-law extrapolation	Chinchilla grid (high-C holdout)	RMSE (log space)0.007	6
Scaling-law extrapolation	Muennighoff grid high-C holdout	RMSE (log space)0.059	6
Scaling-law extrapolation	Gadre grid high-C holdout	RMSE (log space)0.014	6
Scaling-law extrapolation	Porian grid high-C (holdout)	RMSE (log space)0.063	6
Scaling-law extrapolation	Farseer grid (high-C holdout)	RMSE (log space)0.008	6
Scaling-law extrapolation	CIFAR-100 high-D holdout	RMSE (log space)0.069	6
Scaling-law extrapolation	Darcy high-D (holdout)	RMSE (log space)0.17	6

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord