Unleashing the Power of Chain-of-Prediction for Monocular 3D Object Detection

About

Monocular 3D detection (Mono3D) aims to infer 3D bounding boxes from a single RGB image. Without auxiliary sensors such as LiDAR, this task is inherently ill-posed since the 3D-to-2D projection introduces depth ambiguity. Previous works often predict 3D attributes (e.g., depth, size, and orientation) in parallel, overlooking that these attributes are inherently correlated through the 3D-to-2D projection. However, simply enforcing such correlations through sequential prediction can propagate errors across attributes, especially when objects are occluded or truncated, where inaccurate size or orientation predictions can further amplify depth errors. Therefore, neither parallel nor sequential prediction is optimal. In this paper, we propose MonoCoP, an adaptive framework that learns when and how to leverage inter-attribute correlations with two complementary designs. A Chain-of-Prediction (CoP) explores inter-attribute correlations through feature-level learning, propagation, and aggregation, while an Uncertainty-Guided Selector (GS) dynamically switches between CoP and parallel paradigms for each object based on the predicted uncertainty. By combining their strengths, MonoCoP achieves state-of-the-art (SOTA) performance on KITTI, nuScenes, and Waymo, significantly improving depth accuracy, particularly for distant objects.

Zhihao Zhang, Abhinav Kumar, Girish Chandar Ganesan, Xiaoming Liu• 2025

Related benchmarks

Task	Dataset	Result
Monocular 3D Detection	Waymo (val)	AP3D (All)11.76	48
3D Object Detection (Vehicle)	Waymo Open Dataset LEVEL_1 (val)	3D AP Overall11.76	46
3D Object Detection (Vehicle)	Waymo Open Dataset LEVEL_2 (val)	3D AP (Overall)11.03	43
3D Object Detection	KITTI official (val)	AP40 Easy32.06	31
3D Object Detection	KITTI official (test)	AP3D Easy27.54	29
3D Object Detection	KITTI (val)	AP3D (Easy)32.06	28
3D Object Detection	KITTI (test)	Car AP_3D (Easy)27.54	22
3D Object Detection	KITTI (test)	AP3D (Easy)27.54	15
Bird's eye view object detection	KITTI official (test)	mAP (Moderate)25.57	14
3D Object Detection	KITTI official (test)	Pedestrian AP 3D R40 Easy15.61	11

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord