Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Neural Weight Compression for Language Models

About

Efficient compression of language model weights is increasingly critical as model scale and deployment grow. Yet, most existing methods rely on handcrafted transforms and heuristics, reflecting the limited understanding of weights as a data modality. To move beyond this paradigm, we formulate weight compression as neural codec learning and propose Neural Weight Compression (NWC), a framework for training neural codecs on pretrained weight datasets. NWC addresses challenges intrinsic to weight compression, including tensor heterogeneity and the mismatch between reconstruction losses and downstream performance. Experiments show that NWC achieves highly competitive accuracy-compression tradeoffs, with particularly strong results in the 4-6 bit regime, without relying on rigid handcrafted components such as the Hadamard transform. These gains extend to across diverse architectures, e.g., vision encoders. Our analysis highlights the roles of entropy-constrained quantization and learned transforms in adapting compression to weight data and downstream tasks.

Jegwang Ryu, Minkyu Kim, Seungjun Shin, Hee Min Choi, Dokwan Oh, Jaeho Lee• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2
WikiText-2 Score6.32
32
ReasoningReasoning Suite (MMLU-Pro, GPQA Diamond, AIME-24, AIME-25) zero-shot
MMLU-Pro Accuracy73.8
6
Weight Compression Latencyweight tensor 4096 x 4096
Encoding Latency (s)1.64
4
Zero-shot Task EvaluationCommon-sense Zero-shot Benchmarks
OpenQ (Zero-shot)34
4
Showing 4 of 4 rows

Other info

Follow for update