Astro: Activation-guided Structured Regularization for Outlier-Robust LLM Post-Training Quantization
About
Weight-only post-training quantization (PTQ) is crucial for efficient Large Language Model (LLM) deployment but suffers from accuracy degradation caused by weight and activation outliers. Existing mitigation strategies often face critical limitations: they either yield insufficient outlier suppression or incur significant deployment inefficiencies, such as inference latency, heavy preprocessing, or reliance on complex operator fusion. To resolve these limitations, we leverage a key insight: over-parameterized LLMs often converge to Flat Minima, implying a vast equivalent solution space where weights can be adjusted without compromising accuracy. Building on this, we propose Astro, an Activation-guided Structured Regularization framework designed to suppress the negative effects of outliers in a hardware-friendly and efficient manner. Leveraging the activation-guided regularization objective, Astro actively reconstructs intrinsically robust weights, aggressively suppressing weight outliers corresponding to high-magnitude activations without sacrificing model accuracy. Crucially, Astro introduces zero inference latency and is orthogonal to mainstream quantization methods like GPTQ. Extensive experiments show that Astro achieves highly competitive performance; notably, on LLaMA-2-7B, it achieves better performance than complex learning-based rotation methods with almost 1/3 of the quantization time.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText2 | Perplexity3.38 | 1875 | |
| Language Understanding | MMLU 5-shot | Accuracy66.9 | 132 | |
| Question Answering | TriviaQA 5shots | Accuracy77.5 | 30 | |
| Question Answering | ARC-E 0-shot | Accuracy77.1 | 29 | |
| Common Sense Reasoning | HellaSwag 0-shot | Accuracy81.7 | 22 | |
| Common Sense Reasoning | ARC-Challenge 0-shot | Accuracy54.8 | 19 | |
| Question Answering | Natural Questions (NQ) 5-shot | Accuracy35.8 | 16 | |
| Common Sense Reasoning | ARC-Easy (ARC-E) 0-shot | Accuracy79.3 | 12 | |
| Question Answering | ARC-C 0-shot | Accuracy48 | 4 | |
| Sentence Completion | HellaSwag 0-shot | Accuracy74 | 4 |