ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality
About
We study full-reference image quality assessment from a machine-centric perspective, where images are evaluated by how well they preserve information for downstream models. We formulate machine-oriented quality as a latent machine utility and approximate it through pairwise predictive-consistency comparisons. To this end, we construct PCMP, a dataset of PSNR-matched distortion pairs labeled by consistency votes from multiple pretrained models. We further propose ML-CLIPSim, a differentiable quality metric built on a frozen CLIP visual encoder, which aggregates intermediate patch-token similarities and global image embeddings. Experiments on machine-preference benchmarks, human-IQA datasets, and learned image compression show that ML-CLIPSim better aligns with machine-oriented preferences than conventional fidelity and perceptual metrics, while remaining competitive for human quality prediction. Used as a compression distortion term, it improves rate--task trade-offs across multiple downstream tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Quality Assessment | LIVE | SRC0.929 | 127 | |
| Image Quality Assessment Correlation | MPD Severe Distortion | SRCC0.833 | 12 | |
| Image Quality Assessment Correlation | MPD (NSI) | SRCC0.826 | 12 | |
| Image Quality Assessment | PCMP (test) | Accuracy82.84 | 6 | |
| Image Quality Assessment Correlation | MPD Mild Distortion | SRCC0.741 | 6 | |
| Image Quality Assessment Correlation | MPD YoN | SRCC0.569 | 6 | |
| Image Quality Assessment Correlation | MPD MCQ | SRCC0.775 | 6 | |
| Image Quality Assessment Correlation | MPD (CAP) | SRCC0.792 | 6 | |
| Image Quality Assessment Correlation | MPD (Others) | SRCC0.623 | 6 | |
| Image Quality Assessment Correlation | MPD VQA | SRCC0.449 | 6 |