Dual Aggregation Transformer for Image Super-Resolution
About
Transformer has recently gained considerable popularity in low-level vision tasks, including image super-resolution (SR). These networks utilize self-attention along different dimensions, spatial or channel, and achieve impressive performance. This inspires us to combine the two dimensions in Transformer for a more powerful representation capability. Based on the above idea, we propose a novel Transformer model, Dual Aggregation Transformer (DAT), for image SR. Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner. Specifically, we alternately apply spatial and channel self-attention in consecutive Transformer blocks. The alternate strategy enables DAT to capture the global context and realize inter-block feature aggregation. Furthermore, we propose the adaptive interaction module (AIM) and the spatial-gate feed-forward network (SGFN) to achieve intra-block feature aggregation. AIM complements two self-attention mechanisms from corresponding dimensions. Meanwhile, SGFN introduces additional non-linear spatial information in the feed-forward network. Extensive experiments show that our DAT surpasses current methods. Code and models are obtainable at https://github.com/zhengchen1999/DAT.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Super-resolution | Manga109 | PSNR40.33 | 656 | |
| Image Super-resolution | Set5 (test) | PSNR38.58 | 544 | |
| Single Image Super-Resolution | Urban100 | PSNR34.37 | 500 | |
| Image Super-resolution | Set14 | PSNR34.81 | 329 | |
| Image Super-resolution | Urban100 | PSNR34.37 | 221 | |
| Super-Resolution | Set5 x2 | PSNR38.58 | 134 | |
| Super-Resolution | Set14 4x (test) | PSNR29.23 | 117 | |
| Super-Resolution | Set5 x2 (test) | PSNR38.63 | 95 | |
| Image Super-resolution | Urban100 x4 (test) | PSNR27.87 | 90 | |
| Image Super-resolution | BSDS100 | PSNR32.61 | 85 |