Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment

About

Single-view RGB model-based object pose estimation methods achieve strong generalization but are fundamentally limited by depth ambiguity, clutter, and occlusions. Multi-view pose estimation methods have the potential to solve these issues, but existing works rely on precise single-view pose estimates or lack generalization to unseen objects. We address these challenges via the following three contributions. First, we introduce AlignPose, a 6D object pose estimation method that aggregates information from multiple extrinsically calibrated RGB views and does not require any object-specific training or symmetry annotation. Second, the key component of this approach is a new multi-view feature-metric refinement specifically designed for object pose. It optimizes a single, consistent world-frame object pose minimizing the feature discrepancy between on-the-fly rendered object features and observed image features across all views simultaneously. Third, we report extensive experiments on four datasets (YCB-V, T-LESS, ITODD-MV, HouseCat6D) using the BOP benchmark evaluation and show that AlignPose outperforms other published methods, especially on challenging industrial datasets where multiple views are readily available in practice.

Anna \v{S}\'arov\'a Mike\v{s}t\'ikov\'a, M\'ed\'eric Fourmy, Martin C\'ifka, Josef Sivic, Vladimir Petrik• 2025

Related benchmarks

TaskDatasetResultRank
Multi-view 6D pose estimationYCB-V BOP (test)
AR83.9
12
Multi-view 6D pose estimationT-LESS BOP (test)
AR89.6
12
Object Pose EstimationT-LESS (seen)
AR86.8
11
Multi-view 6D pose estimationITODD-MV BOP (test)
Average Recall68.8
3
Multi-view 6D pose estimationHouseCat6D 1.0 (test)
AR85.6
3
Showing 5 of 5 rows

Other info

Follow for update