DiNAT-IR: Exploring Dilated Neighborhood Attention for High-Quality Image Restoration

About

Transformers, with their self-attention mechanisms for modeling long-range dependencies, have become a dominant paradigm in image restoration tasks. However, the high computational cost of self-attention limits scalability to high-resolution images, making efficiency-quality trade-offs a key research focus. To address this, Restormer employs channel-wise self-attention, which computes attention across channels instead of spatial dimensions. While effective, this approach may overlook localized artifacts that are crucial for high-quality image restoration. To bridge this gap, we explore Dilated Neighborhood Attention (DiNA) as a promising alternative, inspired by its success in high-level vision tasks. DiNA balances global context and local precision by integrating sliding-window attention with mixed dilation factors, effectively expanding the receptive field without excessive overhead. However, our preliminary experiments indicate that directly applying this global-local design to the classic deblurring task hinders accurate visual restoration, primarily due to the constrained global context understanding within local attention. To address this, we introduce a channel-aware module that complements local attention, effectively integrating global context without sacrificing pixel-level precision. The proposed DiNAT-IR, a Transformer-based architecture specifically designed for image restoration, achieves competitive results across multiple benchmarks, offering a high-quality solution for diverse low-level computer vision problems.

Hanzhou Liu, Binghan Li, Chengkai Liu, Mi Lu• 2025

Related benchmarks

Task	Dataset	Result
Deraining	Rain100L	PSNR38.93	280
Image Deraining	1200 (test)	PSNR32.31	48
Image Deraining	2800 (test)	PSNR33.91	42
Deraining	Rain100H	PSNR31.26	8
Deraining	Test100	PSNR31.22	5

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord