ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer

About

Recently, transformers have captured significant interest in the area of single-image super-resolution tasks, demonstrating substantial gains in performance. Current models heavily depend on the network's extensive ability to extract high-level semantic details from images while overlooking the effective utilization of multi-scale image details and intermediate information within the network. Furthermore, it has been observed that high-frequency areas in images present significant complexity for super-resolution compared to low-frequency areas. This work proposes a transformer-based super-resolution architecture called ML-CrAIST that addresses this gap by utilizing low-high frequency information in multiple scales. Unlike most of the previous work (either spatial or channel), we operate spatial and channel self-attention, which concurrently model pixel interaction from both spatial and channel dimensions, exploiting the inherent correlations across spatial and channel axis. Further, we devise a cross-attention block for super-resolution, which explores the correlations between low and high-frequency information. Quantitative and qualitative assessments indicate that our proposed ML-CrAIST surpasses state-of-the-art super-resolution methods (e.g., 0.15 dB gain @Manga109 $\times$4). Code is available on: https://github.com/Alik033/ML-CrAIST.

Alik Pramanick, Utsav Bheda, Arijit Sur• 2024

Related benchmarks

Task	Dataset	Result
Image Super-resolution	Manga109	LPIPS0.0032	38
Super-Resolution	Urban100	LPIPS0.0101	18
Super-Resolution	B100	LPIPS0.1812	18
Super-Resolution	Set5	LPIPS0.1312	18
Super-Resolution	Set14	LPIPS0.1173	6

Showing 5 of 5 rows

Other info

Code

Follow for update

@wizwand_team Discord