QPT V2: Masked Image Modeling Advances Visual Scoring

About

Quality assessment and aesthetics assessment aim to evaluate the perceived quality and aesthetics of visual content. Current learning-based methods suffer greatly from the scarcity of labeled data and usually perform sub-optimally in terms of generalization. Although masked image modeling (MIM) has achieved noteworthy advancements across various high-level tasks (e.g., classification, detection etc.). In this work, we take on a novel perspective to investigate its capabilities in terms of quality- and aesthetics-awareness. To this end, we propose Quality- and aesthetics-aware pretraining (QPT V2), the first pretraining framework based on MIM that offers a unified solution to quality and aesthetics assessment. To perceive the high-level semantics and fine-grained details, pretraining data is curated. To comprehensively encompass quality- and aesthetics-related factors, degradation is introduced. To capture multi-scale quality and aesthetic information, model structure is modified. Extensive experimental results on 11 downstream benchmarks clearly show the superior performance of QPT V2 in comparison with current state-of-the-art approaches and other pretraining paradigms.

Qizhi Xie, Kun Yuan, Yunpeng Qu, Mingda Wu, Ming Sun, Chao Zhou, Jihong Zhu• 2024

Related benchmarks

Task	Dataset	Result
Video Quality Assessment	KoNViD-1k	SROCC0.866	208
Video Quality Assessment	LIVE-VQC	SRCC0.827	151
Video Quality Assessment	LSVQ (test)	SRCC0.886	122
Video Quality Assessment	LSVQ 1080p	SRCC0.785	116
Aesthetic Assessment	AVA (test)	SRCC0.865	53
Image Quality Assessment	LIVE (Synthetic)	SRCC0.972	11
Image Quality Assessment	TID2013 (Synthetic)	SRCC0.874	11
Image Quality Assessment	KADID (Synthetic)	SRCC0.897	11
Image Quality Assessment	FLIVE Real-world	SRCC0.645	11
Image Quality Assessment	KonIQ10K Real-world	SRCC0.913	11

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord