MonoScene: Monocular 3D Semantic Scene Completion

About

MonoScene proposes a 3D Semantic Scene Completion (SSC) framework, where the dense geometry and semantics of a scene are inferred from a single monocular RGB image. Different from the SSC literature, relying on 2.5 or 3D input, we solve the complex problem of 2D to 3D scene reconstruction while jointly inferring its semantics. Our framework relies on successive 2D and 3D UNets bridged by a novel 2D-3D features projection inspiring from optics and introduces a 3D context relation prior to enforce spatio-semantic consistency. Along with architectural contributions, we introduce novel global scene and local frustums losses. Experiments show we outperform the literature on all metrics and datasets while hallucinating plausible scenery even beyond the camera field of view. Our code and trained models are available at https://github.com/cv-rits/MonoScene.

Anh-Quan Cao, Raoul de Charette• 2021

Related benchmarks

Task	Dataset	Result
3D Occupancy Prediction	Occ3D-nuScenes (val)	mIoU606	215
Semantic Scene Completion	SemanticKITTI (val)	mIoU (Mean IoU)11.5	119
Semantic Scene Completion	NYU v2 (test)	Ceiling Error8.89	81
Semantic Scene Completion	SemanticKITTI (test)	SSC mIoU11.1	74
Semantic Occupancy Prediction	SemanticKITTI (test)	mIoU34.2	67
3D Semantic Occupancy Prediction	SemanticKITTI (val)	mIoU11.5	59
3D Semantic Occupancy Prediction	SurroundOcc-nuScenes (val)	mIoU7.31	59
3D Semantic Occupancy Prediction	nuScenes-Occupancy (val)	mIoU7.3	51
Semantic Scene Completion	SemanticKITTI official (test)	mIoU34.2	50
Scene Completion	NYU v2 (test)	--	48

Showing 10 of 62 rows

Other info

Code

Follow for update

@wizwand_team Discord