Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Revisiting Vision Language Foundations for No-Reference Image Quality Assessment

About

Large-scale vision language pre-training has recently shown promise for no-reference image-quality assessment (NR-IQA), yet the relative merits of modern Vision Transformer foundations remain poorly understood. In this work, we present the first systematic evaluation of six prominent pretrained backbones, CLIP, SigLIP2, DINOv2, DINOv3, Perception, and ResNet, for the task of No-Reference Image Quality Assessment (NR-IQA), each finetuned using an identical lightweight MLP head. Our study uncovers two previously overlooked factors: (1) SigLIP2 consistently achieves strong performance; and (2) the choice of activation function plays a surprisingly crucial role, particularly for enhancing the generalization ability of image quality assessment models. Notably, we find that simple sigmoid activations outperform commonly used ReLU and GELU on several benchmarks. Motivated by this finding, we introduce a learnable activation selection mechanism that adaptively determines the nonlinearity for each channel, eliminating the need for manual activation design, and achieving new state-of-the-art SRCC on CLIVE, KADID10K, and AGIQA3K. Extensive ablations confirm the benefits across architectures and regimes, establishing strong, resource-efficient NR-IQA baselines.

Ankit Yadav, Ta Duc Huy, Lingqiao Liu• 2025

Related benchmarks

TaskDatasetResultRank
No-Reference Image Quality AssessmentKADID-10K
SROCC0.97
115
Blind Image Quality AssessmentFLIVE
SRCC0.556
115
No-Reference Image Quality AssessmentKonIQ-10k
SROCC0.953
111
No-Reference Image Quality AssessmentSPAQ
SROCC0.928
92
No-Reference Image Quality AssessmentKonIQ-10k (test)
SRCC0.808
35
No-Reference Image Quality AssessmentCLIVE (test)
SROCC0.899
34
No-Reference Image Quality AssessmentCLIVE
SRCC0.909
21
No-Reference Image Quality AssessmentAGIQA3K
SRCC0.878
10
No-Reference Image Quality AssessmentAGIQA1K
SRCC0.873
10
No-Reference Image Quality AssessmentAverage (CLIVE, KonIQ10K, FLIVE, SPAQ, AGIQA3K, AGIQA1K, KADID10K)
SRCC86.2
5
Showing 10 of 10 rows

Other info

Follow for update