Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

QPT V2: Masked Image Modeling Advances Visual Scoring

About

Quality assessment and aesthetics assessment aim to evaluate the perceived quality and aesthetics of visual content. Current learning-based methods suffer greatly from the scarcity of labeled data and usually perform sub-optimally in terms of generalization. Although masked image modeling (MIM) has achieved noteworthy advancements across various high-level tasks (e.g., classification, detection etc.). In this work, we take on a novel perspective to investigate its capabilities in terms of quality- and aesthetics-awareness. To this end, we propose Quality- and aesthetics-aware pretraining (QPT V2), the first pretraining framework based on MIM that offers a unified solution to quality and aesthetics assessment. To perceive the high-level semantics and fine-grained details, pretraining data is curated. To comprehensively encompass quality- and aesthetics-related factors, degradation is introduced. To capture multi-scale quality and aesthetic information, model structure is modified. Extensive experimental results on 11 downstream benchmarks clearly show the superior performance of QPT V2 in comparison with current state-of-the-art approaches and other pretraining paradigms.

Qizhi Xie, Kun Yuan, Yunpeng Qu, Mingda Wu, Ming Sun, Chao Zhou, Jihong Zhu• 2024

Related benchmarks

TaskDatasetResultRank
Video Quality AssessmentKoNViD-1k
SROCC0.866
183
Video Quality AssessmentLIVE-VQC
SRCC0.827
111
Video Quality AssessmentLSVQ (test)
SRCC0.886
84
Video Quality AssessmentLSVQ 1080p
SRCC0.785
78
Aesthetic AssessmentAVA (test)
SRCC0.865
53
Image Quality AssessmentLIVE (Synthetic)
SRCC0.972
11
Image Quality AssessmentTID2013 (Synthetic)
SRCC0.874
11
Image Quality AssessmentKADID (Synthetic)
SRCC0.897
11
Image Quality AssessmentFLIVE Real-world
SRCC0.645
11
Image Quality AssessmentKonIQ10K Real-world
SRCC0.913
11
Showing 10 of 11 rows

Other info

Follow for update