Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning
About
Image Quality Assessment (IQA) is a long-standing problem in computer vision. Previous methods typically focus on predicting numerical scores without explanation or providing low-level descriptions lacking precise scores. Recent reasoning-based vision language models (VLMs) have shown strong potential for IQA by jointly generating quality descriptions and scores. However, existing VLM-based IQA methods often suffer from unreliable reasoning due to their limited capability of integrating visual and textual cues. In this work, we introduce Zoom-IQA, a VLM-based IQA model to explicitly emulate key cognitive behaviors: uncertainty awareness, region reasoning, and iterative refinement. Specifically, we present a two-stage training pipeline: 1) supervised fine-tuning (SFT) on our Grounded-Rationale-IQA (GR-IQA) dataset to teach the model to ground its assessments in key regions, and 2) reinforcement learning (RL) for dynamic policy exploration, stabilized by our KL-Coverage regularizer to prevent reasoning and scoring diversity collapse, with a Progressive Re-sampling Strategy for mitigating annotation bias. Extensive experiments show that Zoom-IQA achieves improved robustness, explainability, and generalization. The application to downstream tasks, such as image restoration, further demonstrates the effectiveness of Zoom-IQA.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Quality Assessment | SPAQ | SRCC0.9 | 191 | |
| Image Quality Assessment | CSIQ | SRC0.754 | 138 | |
| Image Quality Assessment | PIPAL | SRCC0.465 | 95 | |
| Image Quality Assessment | KADID | SRCC0.7 | 95 | |
| Image Quality Assessment | KonIQ | SRCC0.922 | 82 | |
| Image Quality Assessment | LIVE-Wild | PLCC0.887 | 35 | |
| Image Quality Assessment Score Regression | AGIQA | PLCC0.816 | 14 | |
| Image Quality Description | KonIQ | Accuracy8.72 | 8 | |
| Image Quality Description | SPAQ | Accuracy8.63 | 8 |