Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss

About

Three-dimensional object detection from a single view is a challenging task which, if performed with good accuracy, is an important enabler of low-cost mobile robot perception. Previous approaches to this problem suffer either from an overly complex inference engine or from an insufficient detection accuracy. To deal with these issues, we present SS3D, a single-stage monocular 3D object detector. The framework consists of (i) a CNN, which outputs a redundant representation of each relevant object in the image with corresponding uncertainty estimates, and (ii) a 3D bounding box optimizer. We show how modeling heteroscedastic uncertainty improves performance upon our baseline, and furthermore, how back-propagation can be done through the optimizer in order to train the pipeline end-to-end for additional accuracy. Our method achieves SOTA accuracy on monocular 3D object detection, while running at 20 fps in a straightforward implementation. We argue that the SS3D architecture provides a solid framework upon which high performing detection systems can be built, with autonomous driving being the main application in mind.

Eskil J\"orgensen, Christopher Zach, Fredrik Kahl• 2019

Related benchmarks

Task	Dataset	Result
3D Object Detection	KITTI car (test)	AP3D (Easy)10.78	226
3D Object Detection	KITTI Pedestrian (test)	AP3D (Easy)231	75
Bird's Eye View Detection	KITTI Car class official (test)	AP (Easy)16.33	62
3D Object Detection	KITTI Cyclist (test)	AP3D Easy280	59
3D Object Detection	KITTI cars (val)	AP Easy14.52	48
3D Object Detection (Cars)	KITTI (test)	AP (Easy)10.78	40
3D Object Detection	KITTI (test)	AP_R40 Easy10.78	30
3D Object Detection	KITTI (test)	AP3D (Easy)10.78	24
3D Object Detection	KITTI (val1)	AP R11 (Easy)14.52	17
3D Object Detection	KITTI Split1 (val)	AP_R11 (Easy)14.52	14

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord