Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

About

Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To address these issues, we propose Modality-Aware Smoothing Quantization (MASQuant), a novel framework that introduces (1) Modality-Aware Smoothing (MAS), which learns separate, modality-specific smoothing factors to prevent Smoothing Misalignment, and (2) Cross-Modal Compensation (CMC), which addresses Cross-modal Computational Invariance by using SVD whitening to transform multi-modal activation differences into low-rank forms, enabling unified quantization across modalities. MASQuant demonstrates stable quantization performance across both dual-modal and tri-modal MLLMs. Experimental results show that MASQuant is competitive among the state-of-the-art PTQ algorithms. Source code: https://github.com/alibaba/EfficientAI.

Lulu Hu, Wenhu Xiao, Xin Chen, Xinhua Xu, Bowen Xu, Kun Li, Yongliang Tao• 2026

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVizWiz
Accuracy71.5
1525
Text-based Visual Question AnsweringTextVQA
Accuracy82.6
807
Multimodal Optical Character RecognitionOCRBench
Recognition Score84.6
66
Vision UnderstandingMMMU
Accuracy49.9
65
Scientific Question AnsweringScienceQA
Accuracy88.6
61
Multimodal UnderstandingMMMU
Accuracy46.7
38
Voice recognitionLibriSpeech
WER2.7
34
Vision-Audio-TextOmniBench
Accuracy46.9
34
Audio-TextWenetspeech
WER6.9
34
Showing 9 of 9 rows

Other info

Follow for update