Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Cross-Layer Attentive Feature Upsampling for Low-latency Semantic Segmentation

About

Semantic segmentation is a fundamental problem in computer vision and it requires high-resolution feature maps for dense prediction. Current coordinate-guided low-resolution feature interpolation methods, e.g., bilinear interpolation, produce coarse high-resolution features which suffer from feature misalignment and insufficient context information. Moreover, enriching semantics to high-resolution features requires a high computation burden, so that it is challenging to meet the requirement of lowlatency inference. We propose a novel Guided Attentive Interpolation (GAI) method to adaptively interpolate fine-grained high-resolution features with semantic features to tackle these issues. Guided Attentive Interpolation determines both spatial and semantic relations of pixels from features of different resolutions and then leverages these relations to interpolate high-resolution features with rich semantics. GAI can be integrated with any deep convolutional network for efficient semantic segmentation. In experiments, the GAI-based semantic segmentation networks, i.e., GAIN, can achieve78.8 mIoU with 22.3 FPS on Cityscapes and 80.6 mIoU with 64.5 on CamVid using an NVIDIA 1080Ti GPU, which are the new state-of-the-art results of low-latency semantic segmentation. Code and models are available at: https://github.com/hustvl/simpleseg.

Tianheng Cheng, Xinggang Wang, Junchao Liao, Wenyu Liu• 2026

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K (val)
mIoU39.12
2731
Semantic segmentationCityscapes (test)
mIoU78.2
1145
Semantic segmentationCamVid (test)
mIoU80.6
411
Semantic segmentationPASCAL Context (val)
mIoU47.48
323
Semantic segmentationCityscapes (val)
mIoU78.8
18
Showing 5 of 5 rows

Other info

Follow for update