GenPose: Generative Category-level Object Pose Estimation via Diffusion Models
About
Object pose estimation plays a vital role in embodied AI and computer vision, enabling intelligent agents to comprehend and interact with their surroundings. Despite the practicality of category-level pose estimation, current approaches encounter challenges with partially observed point clouds, known as the multihypothesis issue. In this study, we propose a novel solution by reframing categorylevel object pose estimation as conditional generative modeling, departing from traditional point-to-point regression. Leveraging score-based diffusion models, we estimate object poses by sampling candidates from the diffusion model and aggregating them through a two-step process: filtering out outliers via likelihood estimation and subsequently mean-pooling the remaining candidates. To avoid the costly integration process when estimating the likelihood, we introduce an alternative method that trains an energy-based model from the original score-based model, enabling end-to-end likelihood estimation. Our approach achieves state-of-the-art performance on the REAL275 dataset, surpassing 50% and 60% on strict 5d2cm and 5d5cm metrics, respectively. Furthermore, our method demonstrates strong generalizability to novel categories sharing similar symmetric properties without fine-tuning and can readily adapt to object pose tracking tasks, yielding comparable results to the current state-of-the-art baselines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Category-level 6D Pose Estimation | REAL275 (test) | Pose Acc (5°/5cm)84.5 | 53 | |
| Category-level 6D Object Pose Estimation | REAL275 | mAP (5°5cm)60.9 | 16 | |
| Category-level 6D Object Pose Estimation | Camera | mAP (5°2cm)79.9 | 13 | |
| Category-level Object Pose Estimation | Camera | Success Rate (5° 2cm)95.5 | 12 | |
| Category-level 6D Object Pose Estimation | NOCS REAL275 | IoU@7550 | 8 | |
| Category-level pose tracking | REAL275 | 5°5cm Accuracy71.5 | 7 | |
| 6D Pose Estimation | OMNI6DPOSE (test) | Success Rate (5° 2cm)6.6 | 7 | |
| Category-level 6D Object Pose Estimation | ShapeNet-C (test) | Rotation Mean Error (°)48.29 | 7 | |
| Category-level object pose tracking | REAL275 (test) | 5°5cm Accuracy71.5 | 6 |