Detect Anything 3D in the Wild
About
Despite the success of deep learning in close-set 3D object detection, existing approaches struggle with zero-shot generalization to novel objects and camera configurations. We introduce DetAny3D, a promptable 3D detection foundation model capable of detecting any novel object under arbitrary camera configurations using only monocular inputs. Training a foundation model for 3D detection is fundamentally constrained by the limited availability of annotated 3D data, which motivates DetAny3D to leverage the rich prior knowledge embedded in extensively pre-trained 2D foundation models to compensate for this scarcity. To effectively transfer 2D knowledge to 3D, DetAny3D incorporates two core modules: the 2D Aggregator, which aligns features from different 2D foundation models, and the 3D Interpreter with Zero-Embedding Mapping, which stabilizes early training in 2D-to-3D knowledge transfer. Experimental results validate the strong generalization of our DetAny3D, which not only achieves state-of-the-art performance on unseen categories and novel camera configurations, but also surpasses most competitors on in-domain data. DetAny3D sheds light on the potential of the 3D foundation model for diverse applications in real-world scenarios, e.g., rare object detection in autonomous driving, and demonstrates promise for further exploration of 3D-centric tasks in open-world settings. More visualization results can be found at our code repository.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Detection | nuScenes | -- | 19 | |
| 3D Object Detection | OMNI3D OUT | AP (KIT, 3D)38 | 18 | |
| 3D Object Detection | OMNI3D | AP (3D)24.92 | 14 | |
| 3D Object Detection | Omni3D Full Unified (test) | AP 3D (Overall)34.38 | 7 | |
| Video 3D perception | ARKitScenes | F1 Score @ IoU=0.259.4 | 7 | |
| Video 3D perception | ScanNet | F1 Score (IoU=0.25)11.9 | 7 | |
| 3D Object Detection | KITTI | IoU3D48.34 | 6 | |
| 3D Object Detection | SUN RGB-D | IoU3D32.72 | 6 | |
| 3D Object Detection | Hypersim | IoU3D10.79 | 6 | |
| 3D Object Detection | Omni3D OBJECTRON domain | AP@1572.51 | 6 |