CAMixerSR: Only Details Need More "Attention"
About
To satisfy the rapidly increasing demands on the large image (2K-8K) super-resolution (SR), prevailing methods follow two independent tracks: 1) accelerate existing networks by content-aware routing, and 2) design better super-resolution networks via token mixer refining. Despite directness, they encounter unavoidable defects (e.g., inflexible route or non-discriminative processing) limiting further improvements of quality-complexity trade-off. To erase the drawbacks, we integrate these schemes by proposing a content-aware mixer (CAMixer), which assigns convolution for simple contexts and additional deformable window-attention for sparse textures. Specifically, the CAMixer uses a learnable predictor to generate multiple bootstraps, including offsets for windows warping, a mask for classifying windows, and convolutional attentions for endowing convolution with the dynamic property, which modulates attention to include more useful textures self-adaptively and improves the representation capability of convolution. We further introduce a global classification loss to improve the accuracy of predictors. By simply stacking CAMixers, we obtain CAMixerSR which achieves superior performance on large-image SR, lightweight SR, and omnidirectional-image SR.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Super-Resolution | Set14 (test) | PSNR28.82 | 246 | |
| Image Super-resolution | BSD100 (test) | PSNR27.72 | 216 | |
| Super-Resolution | Urban100 (test) | PSNR26.63 | 205 | |
| Super-Resolution | Set5 (test) | PSNR32.51 | 184 | |
| Super-Resolution | ODI-SR (test) | WS-PSNR29.83 | 85 | |
| Super-Resolution | SUN 360 Panorama (test) | WS-PSNR31.6 | 62 | |
| Super-Resolution | Manga109 (test) | PSNR31.18 | 46 | |
| Super-Resolution | DIV8K (test) | PSNR33.81 | 22 | |
| Image Super-resolution | 2K (test) | PSNR26.39 | 19 | |
| Image Super-resolution | F2K | PSNR29.31 | 17 |