DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression

About

We introduce Delta-Aware Quantization (DAQ), a data-free post-training quantization framework that preserves the knowledge acquired during post-training. Standard quantization objectives minimize reconstruction error but are agnostic to the base model, allowing quantization noise to disproportionately corrupt the small-magnitude parameter deltas ($\Delta W$) that encode post-training behavior -- an effect we analyze through the lens of quantization as implicit regularization. DAQ replaces reconstruction-based objectives with two delta-aware metrics -- Sign Preservation Rate and Cosine Similarity -- that directly optimize for directional fidelity of $\Delta W$, requiring only the base and post-trained weight matrices. In a pilot FP8 study, DAQ recovers style-specific capabilities lost under standard quantization while maintaining general performance.

Xiaoming Yu, Shize Tang, Guanghua Yu, Linchuan Xie, Song Liu, Jianchen Zhu, Feng Li• 2026

Related benchmarks

Task	Dataset	Result
General Capability Evaluation	General Capability Dataset	--	10
Dialogue Style Evaluation	SFT Dialogue	--	6
Weight Reconstruction Fidelity	DeepSeek-V3 Weights	--	3

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord