Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image

About

In this paper, we present a novel approach, called Deep MANTA (Deep Many-Tasks), for many-task vehicle analysis from a given image. A robust convolutional network is introduced for simultaneous vehicle detection, part localization, visibility characterization and 3D dimension estimation. Its architecture is based on a new coarse-to-fine object proposal that boosts the vehicle detection. Moreover, the Deep MANTA network is able to localize vehicle parts even if these parts are not visible. In the inference, the network's outputs are used by a real time robust pose estimation algorithm for fine orientation estimation and 3D vehicle localization. We show in experiments that our method outperforms monocular state-of-the-art approaches on vehicle detection, orientation and 3D location tasks on the very challenging KITTI benchmark.

Florian Chabot, Mohamed Chaouch, Jaonary Rabarisoa, C\'eline Teuli\`ere, Thierry Chateau• 2017

Related benchmarks

Task	Dataset	Result
2D vehicle detection	KITTI (test)	AP (Easy)96.4	29
Orientation Estimation	KITTI (test)	AOS (Moderate)89.91	22
Orientation Estimation	KITTI (val1)	AOS (Easy)97.6	10
2D Object Detection	KITTI (val1)	--	9
Orientation Estimation	KITTI (val2)	AOS (Easy)97.44	8
2D vehicle detection	KITTI (val1)	AP (Easy)97.9	5
3D Object Detection	KITTI (val1)	ALP1m (Easy)70.9	5
3D Joint Vehicle Pose and Shape Reconstruction	ApolloCar3D (test)	A3DP Rel Error (c-l)16.04	5
3D Object Detection	KITTI (val2)	ALP1m Easy65.71	5
2D Object Detection	KITTI (val2)	AP2D91.01	5

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord