Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate

About

Large Language Models (LLMs) are widely used across many domains, but their scale makes deployment challenging. Post-Training Quantization (PTQ) reduces memory footprint without retraining by leveraging a small calibration set. Recent Hessian-based PTQ methods compensate quantization error via cross-channel dependencies, but such approaches degrade at low bit-widths due to noisy curvature estimates from limited calibration data. We propose DASH-Q, a robust PTQ framework using diagonal Hessian approximation and iterative weighted least squares. By discarding noise-prone dependencies, DASH-Q filters sampling noise while prioritizing the preservation of salient feature power. We outperform other PTQ baselines in ultra low-bit regime, improving zero-shot accuracy by 7.01% on average and up to 14.01% over the strongest baselines across five baseline LLM models, while showing robust and stable performance with very small calibration data.

Jaemin Kim, Sungkyun Kim, Junyeol Lee, Jiwon Seo• 2026

Related benchmarks

Task	Dataset	Result	Rank
Language Modeling	WikiText-2	Perplexity (PPL)5.51		2320
Zero-shot Reasoning	Zero-Shot Reasoning Tasks (ARC-C, ARC-E, BoolQ, Hella, OBQA, PIQA, SIQA, Wino)	ARC-C Accuracy58.36		54

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord