MTLSI-Net: A Linear Semantic Interaction Network for Parameter-Efficient Multi-Task Dense Prediction

About

Multi-task dense prediction aims to perform multiple pixel-level tasks simultaneously. However, capturing global cross-task interactions remains non-trivial due to the quadratic complexity of standard self-attention on high-resolution features. To address this limitation, we propose a Multi-Task Linear Semantic Interaction Network (MTLSI-Net), which facilitates cross-task interaction through linear attention. Specifically, MTLSI-Net incorporates three key components: a Multi-Task Multi-scale Query Linear Fusion Block, which captures cross-task dependencies across multiple scales with linear complexity using a shared global context matrix; a Semantic Token Distiller that compresses redundant features into compact semantic tokens, distilling essential cross-task knowledge; and a Cross-Window Integrated attention Block that injects global semantics into local features via a dual-branch architecture, preserving both global consistency and spatial precision. These components collectively enable the network to capture comprehensive cross-task interactions at linear complexity with reduced parameters. Extensive experiments on NYUDv2 and PASCAL-Context demonstrate that MTLSI-Net achieves state-of-the-art performance, validating its effectiveness and efficiency in multi-task learning.

Chen Liu, Hengyu Man, Xiaopeng Fan, Debin Zhao• 2026

Related benchmarks

Task	Dataset	Result
Depth Estimation	NYU V2	RMSE0.4904	167
Semantic segmentation	NYUD v2	mIoU57.22	150
Surface Normal Estimation	Pascal Context	Mean Error (MAE)13.71	45
Saliency Detection	Pascal Context	maxF Score84.52	45
Semantic segmentation	Pascal Context	mIoU80.86	42
Human Parsing	Pascal Context	mIoU69.9	35
Boundary Detection	NYUD v2	ODS F-measure78.6	30

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord