Reinforced Axial Refinement Network for Monocular 3D Object Detection

About

Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image. This is an ill-posed problem with a major difficulty lying in the information loss by depth-agnostic cameras. Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space. To improve the efficiency of sampling, we propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step. This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it. The proposed framework, Reinforced Axial Refinement Network (RAR-Net), serves as a post-processing stage which can be freely integrated into existing monocular 3D detection methods, and improve the performance on the KITTI dataset with small extra computational costs.

Lijie Liu, Chufan Wu, Jiwen Lu, Lingxi Xie, Jie Zhou, Qi Tian• 2020

Related benchmarks

Task	Dataset	Result
3D Object Detection	KITTI (test)	AP_3D Car (Easy)16.37	60
Bird's Eye View (BEV) Detection	KITTI Cars (IoU3D ≥ 0.7) (test)	APBEV R40 (Easy)22.45	52
3D Object Detection	KITTI (test)	AP Car (IoU=0.7) Easy16.37	38
Monocular 2D Object Detection	KITTI (test)	AP40 (Easy)16.37	20
3D Object Detection	KITTI Cars (IoU3D ≥ 0.7) (test)	AP3D R40 (Easy)16.37	19
Bird's Eye View 3D Object Detection	KITTI (val1)	AP_BEV (IoU=0.5, Easy)57.12	17
BEV Object Detection	KITTI (test)	AP Car (IoU=0.7) Easy22.45	16
Joint vehicle detection and pose estimation	KITTI car (test)	AOS (Easy)88.4	15
3D Object Detection	KITTI 1 (val)	AP3D Easy54.17	14
Orientation Estimation	KITTI (val1)	AOS (Easy)91.01	10

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord