Neighborhood Attention Transformer
About
We present Neighborhood Attention (NA), the first efficient and scalable sliding-window attention mechanism for vision. NA is a pixel-wise operation, localizing self attention (SA) to the nearest neighboring pixels, and therefore enjoys a linear time and space complexity compared to the quadratic complexity of SA. The sliding-window pattern allows NA's receptive field to grow without needing extra pixel shifts, and preserves translational equivariance, unlike Swin Transformer's Window Self Attention (WSA). We develop NATTEN (Neighborhood Attention Extension), a Python package with efficient C++ and CUDA kernels, which allows NA to run up to 40% faster than Swin's WSA while using up to 25% less memory. We further present Neighborhood Attention Transformer (NAT), a new hierarchical transformer design based on NA that boosts image classification and downstream vision performance. Experimental results on NAT are competitive; NAT-Tiny reaches 83.2% top-1 accuracy on ImageNet, 51.4% mAP on MS-COCO and 48.4% mIoU on ADE20K, which is 1.9% ImageNet accuracy, 1.0% COCO mAP, and 2.6% ADE20K mIoU improvement over a Swin model with similar size. To support more research based on sliding-window attention, we open source our project and release our checkpoints at: https://github.com/SHI-Labs/Neighborhood-Attention-Transformer .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | ADE20K (val) | mIoU49.7 | 2731 | |
| Object Detection | COCO 2017 (val) | -- | 2454 | |
| Image Classification | ImageNet-1K 1.0 (val) | Top-1 Accuracy84.3 | 1866 | |
| Instance Segmentation | COCO 2017 (val) | APm0.452 | 1144 | |
| Image Classification | ImageNet-1K | Top-1 Acc81.8 | 836 | |
| Image Classification | ImageNet-1k (val) | Top-1 Accuracy84.3 | 512 | |
| Image Classification | ImageNet-1k (val) | Top-1 Acc84.3 | 287 | |
| Instance Segmentation | COCO | APmask45.2 | 279 | |
| Object Detection | MS-COCO 2017 (val) | -- | 237 | |
| Object Detection | COCO | AP50 (Box)71.1 | 190 |