Neural Weight Compression for Language Models

About

Efficient compression of language model weights is increasingly critical as model scale and deployment grow. Yet, most existing methods rely on handcrafted transforms and heuristics, reflecting the limited understanding of weights as a data modality. To move beyond this paradigm, we formulate weight compression as neural codec learning and propose Neural Weight Compression (NWC), a framework for training neural codecs on pretrained weight datasets. NWC addresses challenges intrinsic to weight compression, including tensor heterogeneity and the mismatch between reconstruction losses and downstream performance. Experiments show that NWC achieves highly competitive accuracy-compression tradeoffs, with particularly strong results in the 4-6 bit regime, without relying on rigid handcrafted components such as the Hadamard transform. These gains extend to across diverse architectures, e.g., vision encoders. Our analysis highlights the roles of entropy-constrained quantization and learned transforms in adapting compression to weight data and downstream tasks.

Jegwang Ryu, Minkyu Kim, Seungjun Shin, Hee Min Choi, Dokwan Oh, Jaeho Lee• 2025

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2	WikiText-2 Score6.32	86
Reasoning	Reasoning Suite (MMLU-Pro, GPQA Diamond, AIME-24, AIME-25) zero-shot	MMLU-Pro Accuracy73.8	6
Weight Compression Latency	weight tensor 4096 x 4096	Encoding Latency (s)1.64	4
Zero-shot Task Evaluation	Common-sense Zero-shot Benchmarks	OpenQ (Zero-shot)34	4

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord