Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization

About

Localizing objects and parts from natural language in 3D space is essential for robotics, AR, and embodied AI, yet existing methods face a trade-off between the accuracy and geometric consistency of per-scene optimization and the efficiency of feed-forward inference. We present TrianguLang, a feed-forward framework for 3D localization that requires no camera calibration at inference. Unlike prior methods that treat views independently, we introduce Geometry-Aware Semantic Attention (GASA), which utilizes predicted geometry to gate cross-view feature correspondence, suppressing semantically plausible but geometrically inconsistent matches without requiring ground-truth poses. Validated on five benchmarks including ScanNet++ and uCO3D, TrianguLang achieves state-of-the-art feed-forward text-guided segmentation and localization, reducing user effort from $O(N)$ clicks to a single text query. The model processes each frame at 1008x1008 resolution in $\sim$57ms ($\sim$18 FPS) without optimization, enabling practical deployment for interactive robotics and AR applications. Code and checkpoints are available at https://cwru-aism.github.io/triangulang/.

Bryce Grant, Aryeh Rothenberg, Atri Banerjee, Peng Wang• 2026

Related benchmarks

TaskDatasetResultRank
3D Semantic SegmentationScanNet++
mIoU (20 classes)62.4
31
Semantic segmentationScanNet++
Mean IoU (mIoU)62.4
15
Open-Vocabulary SegmentationSPIn-NeRF
mIoU91.4
8
Open-Vocabulary SegmentationNVOS
mIoU93.5
7
3D Semantic SegmentationuCo3D
mIoU94.6
6
Semantic segmentationuCo3D
mIoU94.6
5
Language GroundingLERF-OVS
mIoU58.1
4
Showing 7 of 7 rows

Other info

Follow for update