Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment

About

Reasoning-based image quality assessment (IQA) models trained through reinforcement learning (RL) exhibit exceptional generalization, yet the underlying mechanisms and critical factors driving this capability remain underexplored in current research. Moreover, despite their superior performance, these models incur inference energy usage and latency orders of magnitude higher than their earlier counterparts, restricting their deployment in specific scenarios. Through extensive experiments, this paper verifies and elaborates that through RL training, MLLMs leverage their reasoning capability to convert redundant visual representations into compact, cross-domain aligned text representations. This conversion is precisely the source of the generalization exhibited by these reasoning-based IQA models. Building on this fundamental insight, we propose a novel algorithm, RALI, which employs contrastive learning to directly align images with these generalizable text representations learned by RL. This approach eliminates the reliance on reasoning processes and even obviates the need to load an LLM. For the quality scoring task, this framework achieves generalization performance comparable to reasoning-based models while requiring less than 5% of their model parameters and inference time.

Shijie Zhao, Xuanyu Zhang, Weiqi Li, Junlin Li, Li Zhang, Tianfan Xue, Jian Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Image Quality AssessmentSPAQ
SRCC0.918
250
Image Quality AssessmentCSIQ
SRC0.788
150
Image Quality AssessmentKADID
SRCC0.916
128
Image Quality AssessmentPIPAL
SRCC0.528
123
Image Quality AssessmentKonIQ
SRCC0.922
119
Image Quality AssessmentLIVE-Wild
PLCC0.896
47
Image Quality AssessmentAGIQA
SRCC0.715
28
Image Quality Assessment Score RegressionAGIQA
PLCC0.813
19
FR-IQACSIQ
SRCC0.838
16
IQA Score RegressionLiveW
PLCC0.881
5
Showing 10 of 13 rows

Other info

Follow for update