Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition

About

3D visual grounding aims to identify and localize objects in a 3D space based on textual descriptions. However, existing methods struggle with disentangling targets from anchors in complex multi-anchor queries and resolving inconsistencies in spatial descriptions caused by perspective variations. To tackle these challenges, we propose ViewSRD, a framework that formulates 3D visual grounding as a structured multi-view decomposition process. First, the Simple Relation Decoupling (SRD) module restructures complex multi-anchor queries into a set of targeted single-anchor statements, generating a structured set of perspective-aware descriptions that clarify positional relationships. These decomposed representations serve as the foundation for the Multi-view Textual-Scene Interaction (Multi-TSI) module, which integrates textual and scene features across multiple viewpoints using shared, Cross-modal Consistent View Tokens (CCVTs) to preserve spatial correlations. Finally, a Textual-Scene Reasoning module synthesizes multi-view predictions into a unified and robust 3D visual grounding. Experiments on 3D visual grounding datasets show that ViewSRD significantly outperforms state-of-the-art methods, particularly in complex queries requiring precise spatial differentiation. Code is available at https://github.com/visualjason/ViewSRD.

Ronggang Huang, Haoxin Yang, Yan Cai, Xuemiao Xu, Huaidong Zhang, Shengfeng He• 2025

Related benchmarks

TaskDatasetResultRank
3D Visual GroundingNr3D (test)
Overall Success Rate69.9
88
3D Visual GroundingNr3D
Overall Success Rate69.9
74
3D Visual GroundingSr3D (test)
Overall Accuracy76
73
3D Visual GroundingScanRefer Unique
Acc@0.25 (IoU=0.25)82.1
24
3D Visual GroundingScanRefer
Acc@0.2537.4
23
3D Visual GroundingScanRefer (test)
Unique Accuracy82.1
21
3D Visual GroundingScanRefer Overall
Acc @ 0.2545.4
17
Showing 7 of 7 rows

Other info

Follow for update