DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers

About

Vision Transformers (ViTs) have gained significant attention, but their high computing cost limits the practical applications. While post-training quantization (PTQ) reduces model size and speeds up inference, it often degrades performance, especially in low-bit settings. We identify two key reasons for the performance degradation: 1) existing quantization methods fail to align with the power-law distribution of post-Softmax activations, and 2) reparameterizing post-LayerNorm activations leads to a performance drop due to the significant influence of outliers in the scaling factors. To address these challenges, we propose DopQ-ViT, a Distribution-friendly and Outlier-aware Post-training Quantization method for ViTs. First, DopQ-ViT introduces the Tan Quantizer (TanQ), which better preserves the power-law distribution of post-Softmax activations by focusing more on values near 1. Second, DopQ-ViT presents the MAD-guided Optimal Scaling Factor (MOSF), which selects the optimal scaling factor without introducing additional calculations. Extensive experiments across various ViT models and quantization settings demonstrate that DopQ-ViT, with the help of TanQ and MOSF, outperforms previous PTQ methods on both classification and detection tasks.

Lianwei Yang, Haisong Gong, Haokun Lin, Yichen Wu, Zhenan Sun, Qingyi Gu• 2024

Related benchmarks

Task	Dataset	Result
Camera pose estimation	CO3D v2 (test)	AUC@3088.9	54
Instance Segmentation	COCO	APm43.7	32
Pointmap Regression	DTU	Mean Accuracy1.2	26

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord