Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

About

Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propose the Shannon Scaling Law, a unified theoretical framework that models LLM training as information transmission over a noisy channel, grounded in the Shannon-Hartley theorem. By mapping model parameters to channel bandwidth and training tokens to signal power, our formulation explicitly captures the interaction between learning signal and intrinsic noise. This perspective reveals a fundamental Shannon capacity for LLMs: scaling model size or data without preserving a sufficient signal-to-noise ratio (SNR) inevitably amplifies noise, inducing a transition from monotonic improvement to U-shaped performance degradation. We validate our theory through experiments on Pythia and OLMo2 under perturbations, including Gaussian noise, quantization and supervised fine-tuning on math, QA and code tasks. The Shannon Scaling Law consistently outperforms classical scaling laws and recent perturbation-aware laws, achieving strong $R^2$ scores and accurately capturing loss basins missed by prior approaches. It also extrapolates: fitted on $\leq$6.9B Pythia models with $\leq$180B tokens, it predicts the unseen 12B model up to 307B tokens at pooled $R^2{=}0.847$, while monotonic baselines collapse.

Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma• 2026

Related benchmarks

TaskDatasetResultRank
Model ExtrapolationPythia k=3 (1B, 410M, 160M)
Pooled R^20.605
8
Model ExtrapolationPythia k=4 (≤2.8B)
Pooled R^20.837
8
Model ExtrapolationPythia ≤6.9B (k=5)
Pooled R^20.847
8
Scaling Law ModelingPythia AWQ 4-bit
R2 Score0.9935
8
Scaling Law ModelingPythia bnb 4-bit
R2 Score99.36
8
Scaling Law ModelingPythia quanto 2-bit
R2 Score0.9031
8
Token ExtrapolationPythia Predict 75.5B–307B
Pooled R280.5
8
Token ExtrapolationPythia Predict 180.4B–307B
Pooled R20.781
8
Token ExtrapolationPythia Predict 272.6B–307B
Pooled R20.945
8
Scaling Law FittingPythia Suite
Performance (4-bit)99.53
7
Showing 10 of 14 rows

Other info

Follow for update