Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization

About

Massive activation spikes in Large Language Models (LLMs) severely degrade quantization by stretching dynamic ranges. While prior hypotheses characterize these as high-level scalar biases, we argue that they are merely the scalar intermediates of rigid, structural vector biases in the spike-carrying tokens. We show that these tokens converge to constant vectors after normalization that drive the attention sink and value-state drain mechanisms. We geometrically substantiate this by analyzing the coordination of projection weights: $W_K$ contrastively amplifies the vector, $W_Q$ aligns semantic tokens toward it, and $W_V$ projects it into the spectral null-space. Furthermore, we reveal that the model actively preserves these structural biases against Rotary Positional Embedding (RoPE) perturbations by localizing them in "zones of rotational stability" utilizing low-frequency bands and coherent channel pairs. Leveraging this, we propose INSERTQUANT, a post-training quantization (PTQ) framework that clamps spikes and restores their function via pre-computed template vectors. This renders activations strictly spike-free, enabling robust low-bit quantization with high fidelity. INSERTQUANT achieves parity with state-of-the-art per-tensor quantization methods on LLMs and uniquely generalizes beyond text to other modalities such as ViTs.

Yung-Chin Chen, Chung Peng Lee, Ze-Wei Liou, Naveen Verma• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)--
2333
Image ClassificationImageNet-1k (val)
Top-1 Accuracy87.7
708
Text-to-Image RetrievalFlickr30k (test)
Recall@184.9
525
Image-to-Text RetrievalFlickr30k (test)
R@165.9
472
Common Sense ReasoningCSR (ARC-Easy, ARC-Challenge, BoolQ, PIQA, SIQA, HellaSwag, OpenBookQA, WinoGrande) zero-shot lm-evaluation-harness v0.4.2
Accuracy68.84
32
Showing 5 of 5 rows

Other info

Follow for update