Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

IQA-Spider: Unifying Multi-Granularity Image Quality Assessment with Reasoning, Grounding and Referring

About

We present IQA-Spider, the first image quality assessment (IQA) framework that unifies reasoning, grounding, and referring into a single LMM-based framework for multi-granularity quality understanding. Existing LMM-based IQA methods typically support only partial perception dimensions, such as quality description and question answering~(\textit{i.e.}, reasoning) or pixel-level grounding. This limitation largely stems from the absence of (i) a unified task and data formulation and (ii) effective optimization paradigms for multi-granularity learning. To address these limitations, we formulate a rigorous four-task paradigm covering global and local quality description, pixel-level grounding, and region-level referring. Based on this formulation, we construct a corresponding IQA dataset with a scalable and automatic annotation pipeline, thereby providing a solid foundation for unified multi-granularity learning. To further enable unified perception, we adopt a conflict-free two-stage design that progressively extends text-level multi-granularity understanding to pixel-level grounding: (i) the first stage equips the model with fine-grained text-level reasoning across multiple IQA tasks, and (ii) the second stage introduces a training-free text-to-point grounding paradigm, which bridges textual semantics and pixel-level perception by mapping token logits to spatial coordinates. Based on these efforts, we achieve IQA-Spider with unified multi-granularity explainable image quality assessment. Extensive experiments across multiple benchmarks demonstrate strong performance, validating the effectiveness and versatility of the proposed formulation and framework.

Xinge Peng, Yiting Lu, Xin Li, Zhibo Chen• 2026

Related benchmarks

TaskDatasetResultRank
Image Quality AssessmentKADID-10K
SRCC0.815
62
Vision Question AnsweringQ-Bench LLVisionQA 1.0 (dev)
Overall Score74.45
29
Global Image Quality DescriptionIQA-Spider
Global Description Score (GPT-4V)7.12
9
Local Image Quality DescriptionIQA-Spider
Local Description Score7.1
9
Quality ReferringIQA-Spider short
Accuracy (Short Ref)59.4
9
Quality ReferringIQA-Spider long
Ref_long Accuracy48.4
9
Visual quality groundingIQA-Spider
GPT-4V Grounding Score2.41
9
Visual quality groundingQ-Ground (test)
mIoU33.8
6
Showing 8 of 8 rows

Other info

Follow for update