Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Teaching LMMs for Image Quality Scoring and Interpreting

About

Image quality scoring and interpreting are two fundamental components of Image Quality Assessment (IQA). The former quantifies image quality, while the latter enables descriptive question answering about image quality. Traditionally, these two tasks have been addressed independently. However, from the perspective of the Human Visual System (HVS) and the Perception-Decision Integration Model, they are inherently interconnected: interpreting serves as the foundation for scoring, while scoring provides an abstract summary of interpreting. Thus, unifying these capabilities within a single model is both intuitive and logically coherent. In this paper, we propose Q-SiT (Quality Scoring and Interpreting joint Teaching), a unified framework that enables large multimodal models (LMMs) to learn both image quality scoring and interpreting simultaneously. We achieve this by transforming conventional IQA datasets into learnable question-answering datasets and incorporating human-annotated quality interpreting data for training. Furthermore, we introduce an efficient scoring & interpreting balance strategy, which first determines the optimal data mix ratio on lightweight LMMs and then maps this ratio to primary LMMs for fine-tuning adjustment. This strategy not only mitigates task interference and enhances cross-task knowledge transfer but also significantly reduces computational costs compared to direct optimization on full-scale LMMs. With this joint learning framework and corresponding training strategy, we develop Q-SiT, the first model capable of simultaneously performing image quality scoring and interpreting tasks, along with its lightweight variant, Q-SiT-mini. Experimental results demonstrate that Q-SiT achieves strong performance in both tasks with superior generalization IQA abilities.Project page at https://github.com/Q-Future/Q-SiT.

Zicheng Zhang, Haoning Wu, Ziheng Jia, Weisi Lin, Guangtao Zhai• 2025

Related benchmarks

TaskDatasetResultRank
Spatial Aesthetic Image Quality AssessmentSA-BENCH
Layout PLCC0.141
25
Distortion type classificationPANDABENCH (Hard set)
Accuracy16
14
Distortion IdentificationPANDABENCH Easy
Accuracy18
14
Showing 3 of 3 rows

Other info

Follow for update