LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

About

Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propose the Shannon Scaling Law, a unified theoretical framework that models LLM training as information transmission over a noisy channel, grounded in the Shannon-Hartley theorem. By mapping model parameters to channel bandwidth and training tokens to signal power, our formulation explicitly captures the interaction between learning signal and intrinsic noise. This perspective reveals a fundamental Shannon capacity for LLMs: scaling model size or data without preserving a sufficient signal-to-noise ratio (SNR) inevitably amplifies noise, inducing a transition from monotonic improvement to U-shaped performance degradation. We validate our theory through experiments on Pythia and OLMo2 under perturbations, including Gaussian noise, quantization and supervised fine-tuning on math, QA and code tasks. The Shannon Scaling Law consistently outperforms classical scaling laws and recent perturbation-aware laws, achieving strong $R^2$ scores and accurately capturing loss basins missed by prior approaches. It also extrapolates: fitted on $\leq$6.9B Pythia models with $\leq$180B tokens, it predicts the unseen 12B model up to 307B tokens at pooled $R^2{=}0.847$, while monotonic baselines collapse.

Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma• 2026

Related benchmarks

Task	Dataset	Result
Model Extrapolation	Pythia k=3 (1B, 410M, 160M)	Pooled R^20.605	8
Model Extrapolation	Pythia k=4 (≤2.8B)	Pooled R^20.837	8
Model Extrapolation	Pythia ≤6.9B (k=5)	Pooled R^20.847	8
Scaling Law Modeling	Pythia AWQ 4-bit	R2 Score0.9935	8
Scaling Law Modeling	Pythia bnb 4-bit	R2 Score99.36	8
Scaling Law Modeling	Pythia quanto 2-bit	R2 Score0.9031	8
Token Extrapolation	Pythia Predict 75.5B–307B	Pooled R280.5	8
Token Extrapolation	Pythia Predict 180.4B–307B	Pooled R20.781	8
Token Extrapolation	Pythia Predict 272.6B–307B	Pooled R20.945	8
Scaling Law Fitting	Pythia Suite	Performance (4-bit)99.53	7

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord