Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Transformer for Single Image Super-Resolution

About

Single image super-resolution (SISR) has witnessed great strides with the development of deep learning. However, most existing studies focus on building more complex networks with a massive number of layers. Recently, more and more researchers start to explore the application of Transformer in computer vision tasks. However, the heavy computational cost and high GPU memory occupation of the vision Transformer cannot be ignored. In this paper, we propose a novel Efficient Super-Resolution Transformer (ESRT) for SISR. ESRT is a hybrid model, which consists of a Lightweight CNN Backbone (LCB) and a Lightweight Transformer Backbone (LTB). Among them, LCB can dynamically adjust the size of the feature map to extract deep features with a low computational cost. LTB is composed of a series of Efficient Transformers (ET), which occupies a small GPU memory occupation, thanks to the specially designed Efficient Multi-Head Attention (EMHA). Extensive experiments show that ESRT achieves competitive results with low computational costs. Compared with the original Transformer which occupies 16,057M GPU memory, ESRT only occupies 4,191M GPU memory. All codes are available at https://github.com/luissen/ESRT.

Zhisheng Lu, Juncheng Li, Hong Liu, Chaoyan Huang, Linlin Zhang, Tieyong Zeng• 2021

Related benchmarks

TaskDatasetResultRank
Super-ResolutionSet5
PSNR34.42
751
Super-ResolutionUrban100
PSNR28.46
603
Super-ResolutionSet14
PSNR30.43
586
Super-ResolutionBSD100
PSNR29.15
313
Super-ResolutionManga109
PSNR33.95
298
Single Image Super-ResolutionUrban100 (test)
PSNR32.58
289
Image Super-resolutionManga109 (test)
PSNR39.12
233
Image Super-resolutionBSD100 (test)
PSNR32.25
216
Super-ResolutionSet5 x2
PSNR38.03
134
Super-ResolutionSet5 x3
PSNR34.42
108
Showing 10 of 28 rows

Other info

Follow for update