Affostruction: 3D Affordance Grounding with Generative Reconstruction
About
This paper addresses the problem of affordance grounding from RGBD images of an object, which aims to localize surface regions corresponding to a text query that describes an action on the object. While existing methods predict affordance regions only on visible surfaces, we propose Affostruction, a generative framework that reconstructs complete geometry from partial observations and grounds affordances on the full shape including unobserved regions. We make three core contributions: generative multi-view reconstruction via sparse voxel fusion that extrapolates unseen geometry while maintaining constant token complexity, flow-based affordance grounding that captures inherent ambiguity in affordance distributions, and affordance-driven active view selection that leverages predicted affordances for intelligent viewpoint sampling. Affostruction achieves 19.1 aIoU on affordance grounding (40.4\% improvement) and 32.67 IoU for 3D reconstruction (67.7\% improvement), enabling accurate affordance prediction on complete shapes.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Affordance Grounding | Affogato (test) | aIoU19.1 | 14 | |
| 3D Asset Reconstruction | Toys4k | CD0.2427 | 11 |