Category-Level Metric Scale Object Shape and Pose Estimation

About

Advances in deep learning recognition have led to accurate object detection with 2D images. However, these 2D perception methods are insufficient for complete 3D world information. Concurrently, advanced 3D shape estimation approaches focus on the shape itself, without considering metric scale. These methods cannot determine the accurate location and orientation of objects. To tackle this problem, we propose a framework that jointly estimates a metric scale shape and pose from a single RGB image. Our framework has two branches: the Metric Scale Object Shape branch (MSOS) and the Normalized Object Coordinate Space branch (NOCS). The MSOS branch estimates the metric scale shape observed in the camera coordinates. The NOCS branch predicts the normalized object coordinate space (NOCS) map and performs similarity transformation with the rendered depth map from a predicted metric scale mesh to obtain 6d pose and size. Additionally, we introduce the Normalized Object Center Estimation (NOCE) to estimate the geometrically aligned distance from the camera to the object center. We validated our method on both synthetic and real-world datasets to evaluate category-level object pose and shape.

Taeyeop Lee, Byeong-Uk Lee, Myungchul Kim, In So Kweon• 2021

Related benchmarks

Task	Dataset	Result
Category-level 6D Pose Estimation	REAL275 (test)	Pose Acc (5°/5cm)50.8	53
6D Pose and Size Estimation	REAL275	5°5cm0.054	50
9D Pose Estimation	REAL275 (test)	--	38
Category-level 6D Object Pose Estimation	REAL275	--	34
6D Pose Estimation	NOCS REAL275	Accuracy (5°5cm)5.3	14
3D Object Detection	REAL275	mAP@IoU758.4	12
Pose Estimation	NOCS (test)	mAP IoU 5068.1	10
Pose Estimation	NOCS REAL275 (test)	mAP (IoU=0.50)0.681	10
Category-level 9D Pose Estimation	CAMERA25 (test)	IoU@5032.4	8
3D Object Detection	NOCS CAMERA25	IoU@2593.8	6

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord