QPT V2: Masked Image Modeling Advances Visual Scoring
About
Quality assessment and aesthetics assessment aim to evaluate the perceived quality and aesthetics of visual content. Current learning-based methods suffer greatly from the scarcity of labeled data and usually perform sub-optimally in terms of generalization. Although masked image modeling (MIM) has achieved noteworthy advancements across various high-level tasks (e.g., classification, detection etc.). In this work, we take on a novel perspective to investigate its capabilities in terms of quality- and aesthetics-awareness. To this end, we propose Quality- and aesthetics-aware pretraining (QPT V2), the first pretraining framework based on MIM that offers a unified solution to quality and aesthetics assessment. To perceive the high-level semantics and fine-grained details, pretraining data is curated. To comprehensively encompass quality- and aesthetics-related factors, degradation is introduced. To capture multi-scale quality and aesthetic information, model structure is modified. Extensive experimental results on 11 downstream benchmarks clearly show the superior performance of QPT V2 in comparison with current state-of-the-art approaches and other pretraining paradigms.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Quality Assessment | KoNViD-1k | SROCC0.866 | 183 | |
| Video Quality Assessment | LIVE-VQC | SRCC0.827 | 111 | |
| Video Quality Assessment | LSVQ (test) | SRCC0.886 | 84 | |
| Video Quality Assessment | LSVQ 1080p | SRCC0.785 | 78 | |
| Aesthetic Assessment | AVA (test) | SRCC0.865 | 53 | |
| Image Quality Assessment | LIVE (Synthetic) | SRCC0.972 | 11 | |
| Image Quality Assessment | TID2013 (Synthetic) | SRCC0.874 | 11 | |
| Image Quality Assessment | KADID (Synthetic) | SRCC0.897 | 11 | |
| Image Quality Assessment | FLIVE Real-world | SRCC0.645 | 11 | |
| Image Quality Assessment | KonIQ10K Real-world | SRCC0.913 | 11 |