Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

About

Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To address these issues, we propose Modality-Aware Smoothing Quantization (MASQuant), a novel framework that introduces (1) Modality-Aware Smoothing (MAS), which learns separate, modality-specific smoothing factors to prevent Smoothing Misalignment, and (2) Cross-Modal Compensation (CMC), which addresses Cross-modal Computational Invariance by using SVD whitening to transform multi-modal activation differences into low-rank forms, enabling unified quantization across modalities. MASQuant demonstrates stable quantization performance across both dual-modal and tri-modal MLLMs. Experimental results show that MASQuant is competitive among the state-of-the-art PTQ algorithms. Source code: https://github.com/alibaba/EfficientAI.

Lulu Hu, Wenhu Xiao, Xin Chen, Xinhua Xu, Bowen Xu, Kun Li, Yongliang Tao• 2026

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVizWiz
Accuracy71.5
1820
Visual Question AnsweringTextVQA
Accuracy77
1453
Text-based Visual Question AnsweringTextVQA
Accuracy82.6
962
Science Question AnsweringScienceQA
Accuracy85.7
791
Optical Character RecognitionOCRBench
Score72.8
433
Multimodal UnderstandingSEED
Accuracy69.5
216
Multimodal Optical Character RecognitionOCRBench
Recognition Score84.6
66
Vision UnderstandingMMMU
Accuracy49.9
65
Scientific Question AnsweringScienceQA
Accuracy88.6
61
Multimodal UnderstandingMMMU
Accuracy (MMMU)46.7
52
Showing 10 of 15 rows

Other info

Follow for update