Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Enhancing Image Quality Assessment Ability of LMMs via Retrieval-Augmented Generation

About

Large Multimodal Models (LMMs) have recently shown remarkable promise in low-level visual perception tasks, particularly in Image Quality Assessment (IQA), demonstrating strong zero-shot capability. However, achieving state-of-the-art performance often requires computationally expensive fine-tuning methods, which aim to align the distribution of quality-related token in output with image quality levels. Inspired by recent training-free works for LMM, we introduce IQARAG, a novel, training-free framework that enhances LMMs' IQA ability. IQARAG leverages Retrieval-Augmented Generation (RAG) to retrieve some semantically similar but quality-variant reference images with corresponding Mean Opinion Scores (MOSs) for input image. These retrieved images and input image are integrated into a specific prompt. Retrieved images provide the LMM with a visual perception anchor for IQA task. IQARAG contains three key phases: Retrieval Feature Extraction, Image Retrieval, and Integration & Quality Score Generation. Extensive experiments across multiple diverse IQA datasets, including KADID, KonIQ, LIVE Challenge, and SPAQ, demonstrate that the proposed IQARAG effectively boosts the IQA performance of LMMs, offering a resource-efficient alternative to fine-tuning for quality assessment.

Kang Fu, Huiyu Duan, Zicheng Zhang, Yucheng Zhu, Jun Zhao, Xiongkuo Min, Jia Wang, Guangtao Zhai• 2026

Related benchmarks

TaskDatasetResultRank
Image Quality AssessmentKonIQ-10k (test)
SRCC0.911
91
Image Quality AssessmentKADID-10k (test)
SRCC0.7707
91
Image Quality AssessmentSPAQ (test)
SRCC0.8427
77
Image Quality AssessmentLIVE Challenge (LIVEC) (test)
SRCC0.848
18
Image Quality AssessmentCombined Dataset (COM.) (test)
SRCC0.812
18
Showing 5 of 5 rows

Other info

Follow for update