Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Efficient Multi-Task Scene Analysis with RGB-D Transformers

About

Scene analysis is essential for enabling autonomous systems, such as mobile robots, to operate in real-world environments. However, obtaining a comprehensive understanding of the scene requires solving multiple tasks, such as panoptic segmentation, instance orientation estimation, and scene classification. Solving these tasks given limited computing and battery capabilities on mobile platforms is challenging. To address this challenge, we introduce an efficient multi-task scene analysis approach, called EMSAFormer, that uses an RGB-D Transformer-based encoder to simultaneously perform the aforementioned tasks. Our approach builds upon the previously published EMSANet. However, we show that the dual CNN-based encoder of EMSANet can be replaced with a single Transformer-based encoder. To achieve this, we investigate how information from both RGB and depth data can be effectively incorporated in a single encoder. To accelerate inference on robotic hardware, we provide a custom NVIDIA TensorRT extension enabling highly optimization for our EMSAFormer approach. Through extensive experiments on the commonly used indoor datasets NYUv2, SUNRGB-D, and ScanNet, we show that our approach achieves state-of-the-art performance while still enabling inference with up to 39.1 FPS on an NVIDIA Jetson AGX Orin 32 GB.

S\"ohnke Benedikt Fischedick, Daniel Seichter, Robin Schmidt, Leonard Rabes, Horst-Michael Gross• 2023

Related benchmarks

TaskDatasetResultRank
Semantic segmentationCityscapes
mIoU60.76
658
Semantic segmentationNYU v2 (test)
mIoU51.06
282
Semantic segmentationScanNet (val)
mIoU64.75
274
Surface Normal EstimationNYU v2 (test)--
224
Semantic segmentationSUN RGB-D (test)
mIoU48.82
212
Semantic segmentationSUN RGB-D
mIoU44.13
65
Instance SegmentationScanNet (val)--
62
Scene recognitionSUN RGB-D Scene (test)--
25
Semantic segmentationScanNet20 (val)
mIoU64.75
24
Panoptic SegmentationNYU v2 (test)
PQ43.41
12
Showing 10 of 20 rows

Other info

Code

Follow for update