Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer

About

Recently, transformers have captured significant interest in the area of single-image super-resolution tasks, demonstrating substantial gains in performance. Current models heavily depend on the network's extensive ability to extract high-level semantic details from images while overlooking the effective utilization of multi-scale image details and intermediate information within the network. Furthermore, it has been observed that high-frequency areas in images present significant complexity for super-resolution compared to low-frequency areas. This work proposes a transformer-based super-resolution architecture called ML-CrAIST that addresses this gap by utilizing low-high frequency information in multiple scales. Unlike most of the previous work (either spatial or channel), we operate spatial and channel self-attention, which concurrently model pixel interaction from both spatial and channel dimensions, exploiting the inherent correlations across spatial and channel axis. Further, we devise a cross-attention block for super-resolution, which explores the correlations between low and high-frequency information. Quantitative and qualitative assessments indicate that our proposed ML-CrAIST surpasses state-of-the-art super-resolution methods (e.g., 0.15 dB gain @Manga109 $\times$4). Code is available on: https://github.com/Alik033/ML-CrAIST.

Alik Pramanick, Utsav Bheda, Arijit Sur• 2024

Related benchmarks

TaskDatasetResultRank
Image Super-resolutionManga109
LPIPS0.0032
38
Super-ResolutionSet14
LPIPS0.1173
6
Super-ResolutionB100
LPIPS0.1812
6
Super-ResolutionUrban100
LPIPS0.0101
6
Super-ResolutionSet5
LPIPS0.1312
6
Showing 5 of 5 rows

Other info

Code

Follow for update