Cross-Layer Attentive Feature Upsampling for Low-latency Semantic Segmentation

About

Semantic segmentation is a fundamental problem in computer vision and it requires high-resolution feature maps for dense prediction. Current coordinate-guided low-resolution feature interpolation methods, e.g., bilinear interpolation, produce coarse high-resolution features which suffer from feature misalignment and insufficient context information. Moreover, enriching semantics to high-resolution features requires a high computation burden, so that it is challenging to meet the requirement of lowlatency inference. We propose a novel Guided Attentive Interpolation (GAI) method to adaptively interpolate fine-grained high-resolution features with semantic features to tackle these issues. Guided Attentive Interpolation determines both spatial and semantic relations of pixels from features of different resolutions and then leverages these relations to interpolate high-resolution features with rich semantics. GAI can be integrated with any deep convolutional network for efficient semantic segmentation. In experiments, the GAI-based semantic segmentation networks, i.e., GAIN, can achieve78.8 mIoU with 22.3 FPS on Cityscapes and 80.6 mIoU with 64.5 on CamVid using an NVIDIA 1080Ti GPU, which are the new state-of-the-art results of low-latency semantic segmentation. Code and models are available at: https://github.com/hustvl/simpleseg.

Tianheng Cheng, Xinggang Wang, Junchao Liao, Wenyu Liu• 2026

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K (val)	mIoU39.12	3069
Semantic segmentation	Cityscapes (test)	mIoU78.2	1252
Semantic segmentation	CamVid (test)	mIoU80.6	411
Semantic segmentation	PASCAL Context (val)	mIoU47.48	360
Semantic segmentation	Cityscapes (val)	mIoU78.8	18

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord