Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

JacQuant: STE-Free Quantization-Aware Training via Learned Jacobian Surrogates

About

Quantization-aware training (QAT) is widely deployed but typically relies on the Straight-Through Estimator (STE), which passes gradients through non-differentiable quantizers by fiat. This often makes training brittle near bin boundaries and weakly aligned with the actual behavior of the low-precision model. We introduce JacQuant, a QAT framework that learns a lightweight surrogate of the model's local sensitivity to parameter changes and uses it to stabilize and accelerate training within standard variance-reduced optimizers. The surrogate is inexpensive (diagonal or block-diagonal), data-driven, and compatible with common weight and activation quantizers. On code-preserving training phases, we prove convergence for non-convex objectives and obtain linear rates under a PL condition, and we relate the learned sensitivity to end-to-end output fidelity via a simple calibration argument. Across LLM benchmarks at $\leq 2$ bits, JacQuant consistently reaches higher accuracy than STE-based QAT, and the runtime analyses on various models show that the added cost remains negligible under practical group sizes. The method is drop-in and requires no changes to the forward quantizers; our empirical claims are scoped to ultra-low-bit LLM QAT.

Kai Yi, Vignesh Vivekraja, Harshit Khaitan, Steven Li• 2026

Related benchmarks

TaskDatasetResultRank
Zero-shot EvaluationEight datasets average
Accuracy56.8
112
Language ModelingWikiText-2 (val)
Perplexity (BVS)11.8
70
ReasoningReasoning Benchmarks ARC-e, ARC-c, BoolQ, PIQA, SIQA, HellaS., OBQA, Wino.
ARC-e Accuracy67.3
38
Language ModelingWikiText-2
WikiText-2 Score12.7
32
ReasoningReasoning Benchmarks Zero-shot
Overall Zero-Shot Accuracy57.1
26
Language ModelingWikiText-2
Perplexity11.8
22
Zero-shot ReasoningDownstream Reasoning Tasks (WikiText-2, ARC-e, ARC-c, BoolQ, PIQA, SIQA, HellaS., OBQA, Wino.)
WikiText-2 Acc11.69
6
Showing 7 of 7 rows

Other info

Follow for update