Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality

About

We study full-reference image quality assessment from a machine-centric perspective, where images are evaluated by how well they preserve information for downstream models. We formulate machine-oriented quality as a latent machine utility and approximate it through pairwise predictive-consistency comparisons. To this end, we construct PCMP, a dataset of PSNR-matched distortion pairs labeled by consistency votes from multiple pretrained models. We further propose ML-CLIPSim, a differentiable quality metric built on a frozen CLIP visual encoder, which aggregates intermediate patch-token similarities and global image embeddings. Experiments on machine-preference benchmarks, human-IQA datasets, and learned image compression show that ML-CLIPSim better aligns with machine-oriented preferences than conventional fidelity and perceptual metrics, while remaining competitive for human quality prediction. Used as a compression distortion term, it improves rate--task trade-offs across multiple downstream tasks.

Feng Ding, Haisheng Fu, Jie Liang, Qihan Xu, Siyu Zhu, Jingning Han• 2026

Related benchmarks

TaskDatasetResultRank
Image Quality AssessmentLIVE
SRC0.929
127
Image Quality Assessment CorrelationMPD Severe Distortion
SRCC0.833
12
Image Quality Assessment CorrelationMPD (NSI)
SRCC0.826
12
Image Quality AssessmentPCMP (test)
Accuracy82.84
6
Image Quality Assessment CorrelationMPD Mild Distortion
SRCC0.741
6
Image Quality Assessment CorrelationMPD YoN
SRCC0.569
6
Image Quality Assessment CorrelationMPD MCQ
SRCC0.775
6
Image Quality Assessment CorrelationMPD (CAP)
SRCC0.792
6
Image Quality Assessment CorrelationMPD (Others)
SRCC0.623
6
Image Quality Assessment CorrelationMPD VQA
SRCC0.449
6
Showing 10 of 11 rows

Other info

Follow for update