MUSIQ: Multi-scale Image Quality Transformer
About
Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ, SPAQ and KonIQ-10k.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet 1k (test) | Top-1 Accuracy77.9 | 798 | |
| Image Classification | ImageNet-1k (val) | Top-1 Accuracy77.9 | 512 | |
| Image Quality Assessment | SPAQ | SRCC0.918 | 191 | |
| Image Quality Assessment | CSIQ | SRC0.871 | 138 | |
| Image Quality Assessment | TID 2013 (test) | Mean SRCC0.584 | 124 | |
| Image Quality Assessment | AGIQA-3K | SRCC0.82 | 112 | |
| Image Quality Assessment | CSIQ (test) | SRCC0.71 | 103 | |
| Image Quality Assessment | KonIQ-10k | SRCC0.824 | 96 | |
| Image Quality Assessment | LIVE | SRC0.94 | 96 | |
| Image Quality Assessment | KADID | SRCC55.6 | 95 |