IQA-Spider: Unifying Multi-Granularity Image Quality Assessment with Reasoning, Grounding and Referring

About

We present IQA-Spider, the first image quality assessment (IQA) framework that unifies reasoning, grounding, and referring into a single LMM-based framework for multi-granularity quality understanding. Existing LMM-based IQA methods typically support only partial perception dimensions, such as quality description and question answering~(\textit{i.e.}, reasoning) or pixel-level grounding. This limitation largely stems from the absence of (i) a unified task and data formulation and (ii) effective optimization paradigms for multi-granularity learning. To address these limitations, we formulate a rigorous four-task paradigm covering global and local quality description, pixel-level grounding, and region-level referring. Based on this formulation, we construct a corresponding IQA dataset with a scalable and automatic annotation pipeline, thereby providing a solid foundation for unified multi-granularity learning. To further enable unified perception, we adopt a conflict-free two-stage design that progressively extends text-level multi-granularity understanding to pixel-level grounding: (i) the first stage equips the model with fine-grained text-level reasoning across multiple IQA tasks, and (ii) the second stage introduces a training-free text-to-point grounding paradigm, which bridges textual semantics and pixel-level perception by mapping token logits to spatial coordinates. Based on these efforts, we achieve IQA-Spider with unified multi-granularity explainable image quality assessment. Extensive experiments across multiple benchmarks demonstrate strong performance, validating the effectiveness and versatility of the proposed formulation and framework.

Xinge Peng, Yiting Lu, Xin Li, Zhibo Chen• 2026

Related benchmarks

Task	Dataset	Result
Image Quality Assessment	KADID-10K	SRCC0.815	62
Vision Question Answering	Q-Bench LLVisionQA 1.0 (dev)	Overall Score74.45	29
Global Image Quality Description	IQA-Spider	Global Description Score (GPT-4V)7.12	9
Local Image Quality Description	IQA-Spider	Local Description Score7.1	9
Quality Referring	IQA-Spider short	Accuracy (Short Ref)59.4	9
Quality Referring	IQA-Spider long	Ref_long Accuracy48.4	9
Visual quality grounding	IQA-Spider	GPT-4V Grounding Score2.41	9
Visual quality grounding	Q-Ground (test)	mIoU33.8	6

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord