FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
About
We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups. Our approach can be instantly applied at test-time to a novel object without fine-tuning, as long as its CAD model is given, or a small number of reference images are captured. We bridge the gap between these two setups with a neural implicit representation that allows for effective novel view synthesis, keeping the downstream pose estimation modules invariant under the same unified framework. Strong generalizability is achieved via large-scale synthetic training, aided by a large language model (LLM), a novel transformer-based architecture, and contrastive learning formulation. Extensive evaluation on multiple public datasets involving challenging scenarios and objects indicate our unified approach outperforms existing methods specialized for each task by a large margin. In addition, it even achieves comparable results to instance-level methods despite the reduced assumptions. Project page: https://nvlabs.github.io/FoundationPose/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 6DoF Pose Estimation | YCB-Video (test) | -- | 72 | |
| 6D Object Pose Estimation | LineMOD | Average Accuracy99.9 | 50 | |
| 6D Object Pose Estimation | BOP 7 core datasets: LM-O, T-LESS, TUD-L, IC-BIN, ITODD, HB, YCB-V 82 (test) | AR (LM-O)75.6 | 47 | |
| Pose Estimation | BOP benchmark 2019 (test) | LM-O AR78.8 | 43 | |
| 6D Pose Tracking | YCB-Video (All Frames) | AUC (ADD)96 | 14 | |
| 6D Pose Estimation | occluded YCB-Video (test) | ADD-S97.4 | 8 | |
| 6D Object Pose Tracking | YCBInEOAT (test) | -- | 7 | |
| 6D Object Pose Estimation | General Inference Efficiency Benchmark (test) | Inference Time (s)2.7 | 6 | |
| Object Pose Refinement | LM-O (test) | MSPD86 | 5 | |
| 6DoF Pose Estimation | Novel 3D bin dataset 1.0 (test) | eTE (cm)5.603 | 4 |