TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding

About

Holistic scene understanding includes semantic segmentation, surface normal estimation, object boundary detection, depth estimation, etc. The key aspect of this problem is to learn representation effectively, as each subtask builds upon not only correlated but also distinct attributes. Inspired by visual-prompt tuning, we propose a Task-Specific Prompts Transformer, dubbed TSP-Transformer, for holistic scene understanding. It features a vanilla transformer in the early stage and tasks-specific prompts transformer encoder in the lateral stage, where tasks-specific prompts are augmented. By doing so, the transformer layer learns the generic information from the shared parts and is endowed with task-specific capacity. First, the tasks-specific prompts serve as induced priors for each task effectively. Moreover, the task-specific prompts can be seen as switches to favor task-specific representation learning for different tasks. Extensive experiments on NYUD-v2 and PASCAL-Context show that our method achieves state-of-the-art performance, validating the effectiveness of our method for holistic scene understanding. We also provide our code in the following link https://github.com/tb2-sy/TSP-Transformer.

Shuo Wang, Jing Li, Zibo Zhao, Dongze Lian, Binbin Huang, Xiaomei Wang, Zhengxin Li, Shenghua Gao• 2023

Related benchmarks

Task	Dataset	Result
Surface Normal Estimation	NYU v2 (test)	--	224
Depth Estimation	NYU Depth V2	RMSE0.4961	209
Depth Estimation	NYU V2	RMSE0.4961	167
Semantic segmentation	NYUD v2	mIoU55.39	150
Saliency Detection	Pascal Context (test)	maxF84.86	57
Surface Normal Estimation	Pascal Context (test)	mErr13.69	50
Surface Normal Estimation	Pascal Context	Mean Error (MAE)13.69	45
Saliency Detection	Pascal Context	maxF Score84.86	45
Semantic segmentation	Pascal Context	mIoU81.48	42
Surface Normal Estimation	NYUD	mErr18.44	38

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord