DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers
About
Vision Transformers (ViTs) have gained significant attention, but their high computing cost limits the practical applications. While post-training quantization (PTQ) reduces model size and speeds up inference, it often degrades performance, especially in low-bit settings. We identify two key reasons for the performance degradation: 1) existing quantization methods fail to align with the power-law distribution of post-Softmax activations, and 2) reparameterizing post-LayerNorm activations leads to a performance drop due to the significant influence of outliers in the scaling factors. To address these challenges, we propose DopQ-ViT, a Distribution-friendly and Outlier-aware Post-training Quantization method for ViTs. First, DopQ-ViT introduces the Tan Quantizer (TanQ), which better preserves the power-law distribution of post-Softmax activations by focusing more on values near 1. Second, DopQ-ViT presents the MAD-guided Optimal Scaling Factor (MOSF), which selects the optimal scaling factor without introducing additional calculations. Extensive experiments across various ViT models and quantization settings demonstrate that DopQ-ViT, with the help of TanQ and MOSF, outperforms previous PTQ methods on both classification and detection tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Camera pose estimation | CO3D v2 (test) | AUC@3088.9 | 54 | |
| Instance Segmentation | COCO | APm43.7 | 32 | |
| Pointmap Regression | DTU | Mean Accuracy1.2 | 26 |