Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment

About

Grounding natural language questions to functionally relevant regions in 3D objects -- termed language-driven 3D affordance grounding -- is essential for embodied intelligence and human-AI interaction. Existing methods, while progressing from label-based to language-driven approaches, still face challenges in open-vocabulary generalization, fine-grained geometric alignment, and part-level semantic consistency. To address these issues, we propose a novel two-stage cross-modal framework that enhances both semantic and geometric representations for open-vocabulary 3D affordance grounding. In the first stage, large language models generate part-aware instructions to recover missing semantics, enabling the model to link semantically similar affordances. In the second stage, we introduce two key components: Affordance Prototype Aggregation (APA), which captures cross-object geometric consistency for each affordance, and Intra-Object Relational Modeling (IORM), which refines geometric differentiation within objects to support precise semantic alignment. We validate the effectiveness of our method through extensive experiments on a newly introduced benchmark, as well as two existing benchmarks, demonstrating superior performance in comparison with existing methods.

Dongqiang Gou, Xuming He• 2026

Related benchmarks

Task	Dataset	Result
3D Affordance Grounding	3D-AffordanceLLM Full-view	mIoU32.15	8
3D Affordance Grounding	3D-AffordanceLLM Partial-view	mIoU30.22	8
3D Affordance Grounding	LASO (Seen)	aIoU20.8	6
3D Affordance Grounding	OpenAfford Open-set Full-view	aIoU18.38	5
3D Affordance Grounding	OpenAfford Open-set Partial-view	aIoU15.85	5
3D Affordance Grounding	OpenAfford Closed-set Seen	aIoU19.18	5
3D Affordance Grounding	OpenAfford Closed-set Unseen	aIoU17.81	5

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord