Exploring CLIP for Assessing the Look and Feel of Images

About

Measuring the perception of visual content is a long-standing problem in computer vision. Many mathematical models have been developed to evaluate the look or quality of an image. Despite the effectiveness of such tools in quantifying degradations such as noise and blurriness levels, such quantification is loosely coupled with human language. When it comes to more abstract perception about the feel of visual content, existing methods can only rely on supervised models that are explicitly trained with labeled data collected via laborious user study. In this paper, we go beyond the conventional paradigms by exploring the rich visual language prior encapsulated in Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. In particular, we discuss effective prompt designs and show an effective prompt pairing strategy to harness the prior. We also provide extensive experiments on controlled datasets and Image Quality Assessment (IQA) benchmarks. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments. Code is avaliable at https://github.com/IceClear/CLIP-IQA.

Jianyi Wang, Kelvin C.K. Chan, Chen Change Loy• 2022

Related benchmarks

Task	Dataset	Result
Image Quality Assessment	SPAQ	SRCC0.901	275
Image Quality Assessment	CSIQ	SRC0.862	192
Image Quality Assessment	KADID	SRCC65.4	164
Image Quality Assessment	PIPAL	SRCC43.1	159
Video Quality Assessment	LIVE-VQC	SRCC0.704	151
Image Quality Assessment	KonIQ	SRCC0.905	148
No-Reference Image Quality Assessment	KADID-10K	SROCC0.823	146
Image Quality Assessment	AGIQA-3K	SRCC0.844	137
Blind Image Quality Assessment	FLIVE	SRCC0.602	127
Image Quality Assessment	LIVE	SRC0.95	127

Showing 10 of 162 rows

...

Other info

Code

Follow for update

@wizwand_team Discord